Monday, 13 October 2025

AWS VPC Endpoint

 

A VPC Endpoint is a network component that enables private connectivity between AWS resources in a VPC and supported AWS services, without requiring public IP addresses or traffic to traverse the public internet.​

When to Use a VPC Endpoint


Use VPC Endpoints when security and privacy are priorities, as it allows your resources in private subnets to access AWS services (like S3, DynamoDB, or other supported services) without exposure to the internet.​

VPC Endpoints can improve performance, reduce latency, and simplify network architecture by removing dependencies on NAT gateways or internet gateways.​

They help in scenarios where compliance or regulatory requirements dictate that traffic must remain entirely within the AWS network backbone.​

Use them to save on NAT gateway or data transfer costs when large amounts of traffic are sent to or from AWS services.​

When Not to Use a VPC Endpoint


They may not be suitable if you require internet access for your workloads (e.g., accessing third-party services).​

If your use case does not require private connectivity and your infrastructure already relies on internet/NAT gateways, VPC Endpoints could add unnecessary complexity.​

There is an additional cost for interface endpoints, charged per hour and data transferred, which may be a consideration for cost-sensitive environments.​

Service support is not universal—gateway endpoints only work for S3 and DynamoDB, and not all AWS services support PrivateLink/interface endpoints.​

Alternatives to VPC Endpoints


NAT Gateway or NAT Instance: Provides private subnets with internet access, but all traffic goes over the public internet and incurs NAT gateway/data transfer costs.​

VPN Connection or AWS Direct Connect: Used for private connectivity between on-premises networks and AWS VPCs. These are more suitable for hybrid cloud requirements and broader connectivity scenarios.​

Internet Gateway: Needed if your resources require general internet access, though this exposes them to the public internet.

---

Policy types in AWS

 


There are several types of AWS policies, but the primary and most commonly referenced categories are identity-based policies and resource-based policies.

Main AWS Policy Types



Identity-based policies are attached to AWS IAM identities (users, groups, or roles) and define what actions those entities can perform on which resources.​


Resource-based policies are attached directly to AWS resources (such as S3 buckets or SNS topics), specifying which principals (identities or accounts) can access those resources and what actions are permitted.​

Resource-based policy example:

{
   "Version": "2012-10-17",
   "Id": "Policy1415115909152",
   "Statement": [
     {
       "Sid": "Access-to-specific-VPCE-only",
       "Principal": "*",
       "Action": "s3:GetObject",
       "Effect": "Allow",
       "Resource": ["arn:aws:s3:::yourbucketname",
                    "arn:aws:s3:::yourbucketname/*"],
       "Condition": {
         "StringEquals": {
           "aws:SourceVpce": "vpce-1a2b3c4d"
         }
       }
     }
   ]
}



Other AWS Policy Types

In addition to the above, AWS supports several other policy types, including:
  • Managed policies (AWS managed and customer managed)
  • Inline policies (directly embedded on a single identity)
  • Permissions boundaries (set maximum permissions for identities)
  • Service Control Policies (SCPs, used in AWS Organizations)
  • Access Control Lists (ACLs, primarily for resources like S3 buckets)
  • Session policies (restrict permissions for sessions created with temporary credentials).​
While identity-based and resource-based are the two fundamental categories most often discussed, the broader IAM ecosystem incorporates additional forms for more advanced governance and restrictions.​




How do resource-based and identity-based policies differ?


Resource-based and identity-based policies in AWS differ primarily in their attachment location and in how they control access permissions to AWS resources.​

Key Differences


Identity-based policies are attached to IAM entities (users, groups, or roles) and specify what actions these identities can perform on which resources. For example, an IAM user can have a policy that permits reading from specific DynamoDB tables or starting EC2 instances.​

Resource-based policies are attached directly to AWS resources (such as S3 buckets, SNS topics, or KMS keys). These policies define which principals (users, roles, accounts) can access the resource and what actions they are allowed to perform. Resource-based policies allow for fine-grained control, including granting access to principals outside of the resource owner’s AWS account.​

Attachment and Usage


Identity-based policies are managed at the IAM level and generally offer broader access control for multiple resources through one principal.​

Resource-based policies are applied specifically to resources and are used when the access control needs to be defined at the resource level, possibly for cross-account or external identity access.​

Policy Evaluation


When a request to access a resource is made, AWS evaluates all applicable identity-based and resource-based policies together.

An explicit 'Deny' found in any policy will override any 'Allow' statements, ensuring comprehensive and secure permission management.​

Both types can be used separately or together, depending on security architecture and the granularity of control required for identities and resources.


If resource based policy allows access to some user, do we need a separate identity-based policy which allows access to that resource to be attached to that user?


If a resource-based policy allows access to a user, there is generally no need to attach a separate identity-based policy for that specific action—access will be granted solely based on the resource-based policy's allowance.​​

Details on Policy Evaluation Logic


When a request is made, AWS checks both identity-based and resource-based policies that apply to the request.​

If any policy contains an explicit 'Deny', access is denied regardless of any 'Allow' statements elsewhere.​

If at least one policy (whether identity-based or resource-based) contains an 'Allow' and there are no explicit Deny statements, the action is permitted.​​

For resource-based policies, the permission is granted directly to the designated principal (user or role), even if the user or role does not have a corresponding identity-based policy that allows the action.​​

In cross-account scenarios, both the identity-based policy in the requesting account and the resource-based policy on the target resource must allow the action; otherwise, access is denied.​

A user can be granted access solely by a resource-based policy, but if finer-grained or multiple permissions are needed across various resources, identity-based policies are beneficial. A resource-based policy alone can grant access to a user, without needing a separate identity-based policy for that resource and action—the access will work as long as the resource-based policy allows it and there is no explicit deny elsewhere. AWS evaluates all applicable identity-based and resource-based policies for a request; if any presents an explicit deny, access is rejected, but if at least one policy allows the action, access is permitted.​​

This means a user with no identity-based permission, but with permission in a resource-based policy, can still access that specific resource unless a deny blocks them. However, in cross-account situations, both a corresponding identity-based policy in the user's account and a resource-based policy in the resource owner's account must allow the action for access to succeed.

Are policies listed in IAM in AWS Console, only identity-based policies?


Yes, the policies listed in the IAM section of the AWS Console are only identity-based policies—specifically managed policies and inline policies that are attached to IAM users, groups, or roles.​

IAM policies you see under "Policies" are either AWS managed, customer managed, or inline identity-based policies.​

Resource-based policies (such as S3 bucket policies, SNS topic policies, or Lambda resource policies) are not centrally listed in IAM “Policies” in the Console; instead, they are managed from the respective resource consoles (e.g., via the S3 or Lambda management screens).​

The IAM Console does not display resource-based policies in the Policies list, since these are stored on resources, not IAM identities.​

To summarize, only identity-based policies—managed and inline—are listed in the IAM policies view in the AWS Console. Resource-based policies are managed and reviewed from the console page of each AWS service resource

---

Tuesday, 7 October 2025

Amazon RDS (Relational Database Service)

 


Amazon Relational Database Service (RDS) 
  • Distributed relational database service
  • Simplifies the setup, operation, and scaling of a relational database
  • Automates admin tasks like patching the database software, backing up databases and enabling point-in-time recovery 
  • Scaling storage and compute resources is done via API call

Amazon RDS supports eight major database engines:
  • Oracle (proprietary)
  • Microsoft SQL Server (proprietary)
  • IBM Db2 (community-developed)
  • Amazon Aurora (MySQL- and PostgreSQL-compatible)(open-source)
  • MySQL
  • MariaDB
  • PostgreSQL


Networking

We can launch Amazon RDS databases in the public or private subnet of a VPC. 

If DB instance is in a public subnet and we want it to be accessible from Internet:

  • Publicly Accessible property of the DB instance needs to be set to Yes
  • Inbound rules for the security group of the RDS instance need to allow connections from source IP
  • Internet Gateway needs to be attached to VPC

Troubleshooting


To test if RDS is accessible from Internet (and also that it's up and listening) we can use Telnet or Netcat.

If using MacOS, Telnet is not installed by default so install it via brew:

% brew install telnet

% telnet test-example-com-13-20240712155825.ckmh7hyrsza3.us-east-1.rds.amazonaws.com 3306 
Trying 121.65.2.95...
Connected to ec2-10-10-129-99.compute-1.amazonaws.com.
Escape character is '^]'.
Connection closed by foreign host.

If we want to use Netcat:

% nc test-example-com-13-20240712155825.ckmh7hyrsza3.us-east-1.rds.amazonaws.com 3306 -v
Connection to test-example-com-13-20240712155825.ckmh7hyrsza3.us-east-1.rds.amazonaws.com port 3306 [tcp/mysql] succeeded!


MySQL

Terraform


These are the privileges that allow user to execute DROP USER in RDS MySQL DB:

# Global privileges
resource "mysql_grant" "bojan_global" {
  user       = mysql_user.bojan.user
  host       = mysql_user.bojan.host
  database   = "*"
  table      = "*"
  privileges = ["CREATE USER"]
}

# Database-level privileges
resource "mysql_grant" "bojan" {
  user       = mysql_user.bojan.user
  host       = mysql_user.bojan.host
  database   = "%"
  privileges = ["SELECT", "SHOW VIEW", "INSERT", "UPDATE", "EXECUTE", "DELETE"]
}

root user in MySQL in RDS has not all admin privileges as if it was the regular MySQL instance.

root user in RDS MySQL can't grant SYSTEM_USER privilege to another user as this error occurs:

Error 1227 (42000): Access denied; you need (at least one of) the RDSADMIN USER privilege(s) for this operation

root user does not have privileges to grant privileges on mysql DB as this error occurs:

Error running SQL (GRANT DELETE ON `mysql`.* TO 'bojan'@'%'): Error 1044 (42000): Access denied for user 'root'@'%' to database 'mysql'


---

Resources:

Friday, 5 September 2025

Introduction to Amazon Kinesis Data Streams

 


Amazon Kinesis Data Streams is one of 4 Amazon Kinesis services. It helps to easily stream data at any scale.

  • There are no servers to manage
  • Two capacity modes:
    •  on-demand: eliminates the need to provision or manage capacity required for running applications; Automatic provisioning and scaling
    •  provisioned
  • Pay only for what you use 
  • Built-in integrations with other AWS services to create analytics, serverless, and application integration solutions
  • To ingest and collect terabytes of data per day from application and service logs, clickstream data, sensor data, and in-app user events to power live dashboards, generate metrics, and deliver data into data lakes
  • To build applications for high-frequency event data such as clickstream data, and gain access to insights in seconds, not days, using AWS Lambda or Amazon Managed Service for Apache Flink
  • To pair with AWS Lambda to respond to or adjust immediate occurrences within the event-driven applications in your environment, at any scale.

The producers continually push data to Kinesis Data Streams, and the consumers process the data in real time. Producer can be e.g. CloudWatch Log and consumer can be e.g. Lambda function.

A producer puts data records into shards and a consumer gets data records from shards. Consumers use shards for parallel data processing and for consuming data in the exact order in which they are stored. If writes and reads exceed the shard limits, the producer and consumer applications will receive throttles, which can be handled through retries.

Capacity Modes

  • Provisioned
    • data stream capacity is fixed
    • 1 shard has fixed capacities: 
      • Write: Maximum 1 MiB/second 1,000 records/second 
      • Read: Maximum 2 MiB/second
    • N shards will multiply R/W capacity by N
  • On-demand
    • data stream capacity scales automatically

Data Retention Period

A Kinesis data stream stores records from 24 hours by default, up to 8760 hours (365 days). You can update the retention period via the Kinesis Data Streams console or by using the IncreaseStreamRetentionPeriod and the DecreaseStreamRetentionPeriod operations.

The default retention period of 24 hours covers scenarios where intermittent lags in processing require catch-up with the real-time data. A seven-day retention lets you reprocess data for up to seven days to resolve potential downstream data losses. Long-term data retention greater than seven days and up to 365 days lets you reprocess old data for use cases such as algorithm back testing, data store backfills, and auditing.


Data Stream << Shards << Data Records


Kinesis data stream is a set of shards. Each shard has a sequence of data records.

A data record is the unit of data stored in a Kinesis data stream. Data records are composed of:
  • Sequence number
    • within the shard
    • assigned by Kinesis Data Streams
  • Partition key
    • used to isolate and route records to different shards of a data stream
    • specified by your data producer while adding data to a Kinesis data stream. For example, let’s say you have a data stream with two shards (shard 1 and shard 2). You can configure your data producer to use two partition keys (key A and key B) so that all records with key A are added to shard 1 and all records with key B are added to shard 2.
  • Data blob
    • an immutable sequence of bytes
    • data of interest your data producer adds to a data stream
    • Kinesis Data Streams does not inspect, interpret, or change the data in the blob in any way
    • A data blob (the data payload before Base64-encoding)  can be up to 1 MB.

How to find all shards in a stream?

% aws kinesis list-shards \
--stream-name my-stream \
--region us-east-2 \
--profile my-profile
{
    "Shards": [
        {
            "ShardId": "shardId-000000000000",
            "HashKeyRange": {
                "StartingHashKey": "0",
                "EndingHashKey": "440282366920938463463374607431768211455"
            },
            "SequenceNumberRange": {
                "StartingSequenceNumber": "49663754454378916691333541504734985347376184017408753666"
            }
        }
    ]
}


Note HashKeyRange - it is used by data producers to determine into which shard to write the next document.

Shard Iterator


Shard iterators are a fundamental tool for consumers to process data in the correct order and without missing records. They provide fine control over the reading position, making it possible to build scalable, reliable stream processing workflows.

A shard iterator is a reference marker that specifies the exact position within a shard from which data reading should begin and continue sequentially. It enables consumers to access, read, and process records from a particular location within a data stream shard.

Different types of shard iterators control where reading starts:
  • AT_SEQUENCE_NUMBER: Reads from the specific sequence number.
  • AFTER_SEQUENCE_NUMBER: Reads just after the specified sequence number.
  • TRIM_HORIZON: Starts from the oldest available data in the shard.
  • LATEST: Starts from the newest record (most recently added record, records added after the iterator is created)
  • AT_TIMESTAMP: Starts from a specific timestamp in the shard
When reading repeatedly from a stream, the shard iterator is updated after each GetRecords request (with the NextShardIterator returned by the API).

This mechanism allows applications to seamlessly resume reading from the correct point in the stream.

Proper management of shard iterators is essential to avoid data loss or duplicate reads, especially as data retention policies and processing speeds can affect the availability of data in the stream


How to get an iterator in a shard?


A shard iterator is obtained using the GetShardIterator API, which marks the spot in the shard to start reading records.

The command aws kinesis get-shard-iterator is used to obtain a pointer (iterator) to a specific position in a shard of a Kinesis stream. We need this iterator to actually read records from the stream using aws kinesis get-records.


% ITERATOR=$(aws kinesis get-shard-iterator \
    --stream-name my-stream \
    --shard-id shardId-000000000000 \
    --shard-iterator-type TRIM_HORIZON \
    --query 'ShardIterator' \
    --output text \
    --region us-east-2 \
    --profile my-profile)

ShardIterator is a token we pass to aws kinesis get-records to actually fetch the data.

% echo $ITERATOR
AAAAAAAAAAHf65JkbV8ZJQ...Exsy8WerU5Z8LKI8wtXm95+blpXljd0UWgDs7Seo9QlpikJI/6U=


We can now read records directly from the stream:

% aws kinesis get-records \
--shard-iterator $ITERATOR \
--limit 50 \
--region us-east-2 \
--profile my-profile
{
    "Records": [
        {
            "SequenceNumber": "49666716197061357389751170868210642185623031110770360322",
            "ApproximateArrivalTimestamp": "2025-09-03T17:37:00.343000+01:00",
            "Data": "H4sIAAAAAAAA/+3YTW/TMBgH8K8S...KaevxIQAA",
            "PartitionKey": "54dc991cd70ae6242c35f01972968478"
        },
        {
            "SequenceNumber": "49666716197061357389751170868211851111442645739945066498",
            "ApproximateArrivalTimestamp": "2025-09-03T17:37:00.343000+01:00",
            "Data": "H4sIAAAAAAAA/+3YS2vcM...889uPBV8BVXceAAA=",
            "PartitionKey": "e4e9ad254c154281a67d05a33fa0ea31"
        },
        ...
   ]
}

Data field in each record is Base64-encoded. That’s because Kinesis doesn’t know what your producers are sending (could be JSON, gzipped text, protobuf, etc.), so it just delivers raw bytes.

Why each record read from the shard has a different Partition Key?


Each record in a Kinesis stream can have a different PartitionKey because the PartitionKey is chosen by the data producer for each record and is not tied directly to the way records are read from the stream or which iterator type is used. Even when reading from a single shard using TRIM_HORIZON, records within that shard may have different PartitionKeys because the PartitionKey is used to route records to shards at the time of writing—not to group records within a shard during reading.


How Partition Keys and Shards Work


The PartitionKey determines to which shard a record is sent by hashing the key and mapping it to a shard's hash key range. Multiple records with different PartitionKeys can end up in the same shard, especially if the total number of unique partition keys is greater than the number of shards, or if the hash function maps them together. When reading from a shard (with any iterator, including TRIM_HORIZON), the records are read in order of arrival, but each can have any PartitionKey defined at ingest time.

Reading and PartitionKeys


Using TRIM_HORIZON just means starting at the oldest record available in the shard. It does not guarantee all records have the same PartitionKey, only that they are the oldest records remaining for that shard. Records from a single shard will often have various PartitionKeys, all mixed together as per their original ingest. Therefore, it is normal and expected to see a variety of PartitionKeys when reading a batch of records from the same shard with TRIM_HORIZON


Writing and Reading

Data records written by the same producer can end up in different shards if the producer chooses different PartitionKeys for those records. The distribution of records across shards is determined by the hash of the PartitionKey assigned to each record, not by the producer or the order of writing.

Distribution Across Shards

If a producer sends data with varying PartitionKeys, Kinesis uses those keys' hashes to assign records to shards; thus, even the same producer's records can be spread across multiple shards.
If the producer uses the same PartitionKey for all records, then all its records will go to the same shard, preserving strict ordering for that key within the shard.

Reading N Records: Producer and Sequence

When reading N records from a shard iterator, those records are the next available records in that shard, in the order they arrived in that shard. These records will not necessarily all be from the same producer, nor are they guaranteed to be N consecutive records produced by any single producer.

Records from different producers, as well as from the same producer if it used multiple PartitionKeys, can appear in any sequence within a shard, depending on how PartitionKeys are mapped at write time.
In summary, unless a producer always uses the same PartitionKey, its records may spread across shards, and any batch read from a shard iterator will simply reflect the ordering of records within that shard, including records from multiple producers and PartitionKeys.


How to get records content in a human-readable format?

We need to extract Data field from a record, Base64 decode it and then process the data further. In our example the payload was gzip-compressed JSON (as that's what CloudWatch Logs → Kinesis subscription delivers) so we need to decompress it and parse it as JSON:


% aws kinesis get-records \
--shard-iterator $ITERATOR \
--limit 1 \
--query 'Records[0].Data' \
--output text \
--region us-east-2 \
--profile my-profile \
| base64 --decode \
| gzip -d \
| jq .

{
  "messageType": "DATA_MESSAGE",
  "owner": "123456789999",
  "logGroup": "/aws/lambda/my-lambda",
  "logStream": "2025/09/03/[$LATEST]202865ca545544deb61360d571180d45",
  "subscriptionFilters": [
    "my-lambda-MainSubscriptionFilter-r1YlqNvCVDqk"
  ],
  "logEvents": [
    {
      "id": "39180581104302115620949753249891540826922898325656305664",
      "timestamp": 1756918020250,
      "message": "{\"log.level\":\"info\",\"@timestamp\":\"2025-09-03T16:47:00.250Z\",\"log.origin\":{\"function\":\"github.com/elastic/apm-aws-lambda/app.(*App).Run\",\"file.name\":\"app/run.go\",\"file.line\":98},\"message\":\"Exiting due to shutdown event with reason spindown\",\"ecs.version\":\"1.6.0\"}\n"
    },
    {
      "id": "39180581104302115620949753249891540826922898325656305665",
      "timestamp": 1756918020250,
      "message": "{\"log.level\":\"warn\",\"@timestamp\":\"2025-09-03T16:47:00.250Z\",\"log.origin\":{\"function\":\"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).forwardLambdaData\",\"file.name\":\"apmproxy/apmserver.go\",\"file.line\":357},\"message\":\"Dropping lambda data due to error: metadata is not yet available\",\"ecs.version\":\"1.6.0\"}\n"
    },
    {
      "id": "39180581104302115620949753249891540826922898325656305666",
      "timestamp": 1756918020250,
      "message": "{\"log.level\":\"warn\",\"@timestamp\":\"2025-09-03T16:47:00.250Z\",\"log.origin\":{\"function\":\"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).forwardLambdaData\",\"file.name\":\"apmproxy/apmserver.go\",\"file.line\":357},\"message\":\"Dropping lambda data due to error: metadata is not yet available\",\"ecs.version\":\"1.6.0\"}\n"
    }
  ]
}


Metrics to Observe


Basic and enhanced CloudWatch Metrics


Kinesis Data Streams and Amazon CloudWatch are integrated so that you can collect, view, and analyze CloudWatch metrics for your Kinesis data streams. This integration supports basic stream-level and enhanced shard-level monitoring for Kinesis data streams.
  • Basic (stream-level) metrics – Stream-level data is sent automatically every minute at no charge.
  • Enhanced (shard-level) metrics – Shard-level data is sent every minute for an additional cost.

Stream has producers and consumers. If rate of writing new records is higher than rate of reading them, records will reach their retention age (24 hours by default) and the oldest unread will start being removed from the stream and lost forever.

To prevent this from happening we can monitor some stream metrics and also set alarms when they reach critical thresholds.


GetRecords.IteratorAgeMilliseconds

It measures how old the oldest record returned by GetRecords is (how far our consumer lags). A very large value → our consumer(s) aren’t keeping up with the incoming write rate.

The age of the last record in all GetRecords calls made against a Kinesis stream, measured over the specified time period. Age is the difference between the current time and when the last record of the GetRecords call was written to the stream. The Minimum and Maximum statistics can be used to track the progress of Kinesis consumer applications. A value of zero indicates that the records being read are completely caught up with the stream. Shard-level metric name: IteratorAgeMilliseconds.

Meaningful Statistics: Minimum, Maximum, Average, Samples

Unit info: Milliseconds

There are 86,400,000 milliseconds in a day so if the reading of this metric goes above it, that means that some records will be lost.

Iterator-age number is a classic “consumer is falling behind” symptom.

IteratorAgeMilliseconds = 86.4M → very high, backlog building


GetRecords.Bytes

The number of bytes retrieved from the Kinesis stream, measured over the specified time period. Minimum, Maximum, and Average statistics represent the bytes in a single GetRecords operation for the stream in the specified time period.  
             
Shard-level metric name: OutgoingBytes
             
Meaningful Statistics: Minimum, Maximum, Average, Sum, Samples
             
Unit info: Bytes



GetRecords - sum (MiB/second)

GetRecords - sum (Count)
  • GetRecords.Records decreasing → Lambda is lagging


GetRecords iterator age - maximum (Milliseconds)
GetRecords latency - average (Milliseconds)

GetRecords success - average (Ratio)
  • GetRecords.Success decreasing → Lambda is not keeping up

Incoming data - sum (MiB/second)
Incoming data - sum (Count)

PutRecord - sum (MiB/second)
PutRecords - sum (MiB/second)
PutRecord latency - average (Milliseconds)
PutRecords latency - average (Milliseconds)
PutRecord success - average (Ratio)
PutRecords successful records - average (Percent)
PutRecords failed records - average (Percent)
PutRecords throttled records - average (Percent)

Read throughput exceeded - average (Ratio)
Write throughput exceeded - average (Count)







Addressing Bottlenecks


If Lambda function is stream consumer, with just 1 shard, only 1 Lambda invocation can read at a time (per shard). If our log rate exceeds what that shard can handle, the iterator age skyrockets.

It is possible to change number of shards on a live Kinesis stream:

aws kinesis update-shard-count \
  --stream-name your-stream-name \
  --target-shard-count 4 \
  --scaling-type UNIFORM_SCALING


Each shard = 1 MiB/s write and 2 MiB/s read, so 4 shards = 4 MiB/s write and 8 MiB/s read.
Lambda will then process 4 records batches in parallel (one per shard).


To empower Lambda for faster processing, consider increasing its RAM memory e.g. from 256 to 512MB. In AWS Console, this can be done in Lambda >> Configuration >> General configuration.

In Kinesis stream trigger, we can try to increase:
  • batch size: 100 --> 400
  • concurrent batches per shard: 1 --> 5

Example of settings for Kinesis stream trigger:

  • Activate trigger: Yes
  • Batch size: 400
  • Batch window: None
  • Concurrent batches per shard: 5
  • Event source mapping ARN: arn:aws:lambda:us-east-2:123456789012:event-source-mapping:80bd81a9-c175-4af5-9aa9-8926b0587f40
  • Last processing result: OK
  • Maximum age of record: -1
  • Metrics: None
  • On-failure destination: None
  • Report batch item failures: No
  • Retry attempts: -1
  • Split batch on error: No
  • Starting position: TRIM_HORIZON
  • Tags: View
  • Tumbling window duration: None
  • UUID: 80bd81a9-c175-4af5-9aa9-8926b0587f40

In AWS Console, this can be done in Lambda >> Configuration >> Triggers.


What “Concurrent batches per shard” does?
  • Each shard can have multiple batches being processed at the same time.
  • Default is 1, meaning: the next batch from a shard won’t be sent to Lambda until the previous batch finishes.
  • If your Lambda is slow or variable in duration, this can create a backlog because only one batch per shard is in flight at a time.

When to increase it?
  • If IteratorAgeMilliseconds is very high → Lambda cannot keep up with the stream.
  • If Lambda execution duration is variable → a single batch in flight per shard limits throughput.
  • If you have sufficient Lambda concurrency (which you do — up to 15) → you can safely allow multiple batches per shard.

Recommended approach
  • Start with 2–5 concurrent batches per shard
    • This allows Kinesis to send multiple batches from the same shard to Lambda simultaneously.
    • Observe if IteratorAgeMilliseconds decreases.
  • Monitor Lambda throttles and duration
    • Ensure your Lambda’s memory/cpu and timeout can handle multiple concurrent batches.
    • if no throttling currently → room to increase concurrency.
  • Adjust batch size if needed
    • Larger batch sizes may help throughput, but smaller batch sizes + higher concurrency often reduce latency.

Important notes
  • Increasing concurrency per shard does not increase shard limits; it just allows more parallelism per shard.
  • If Lambda fails batches, retries are per batch → more concurrency increases the number of batches being retried simultaneously.

Practical starting point:
  • Set Concurrent batches per shard = 5
  • Keep batch size = 400
  • Monitor:
    • IteratorAgeMilliseconds
    • GetRecords.Records
    • Lambda duration / concurrency
  • Adjust up/down based on observed backlog.

Whatever you change, do a single change at a time and after that monitor performance. That will give a clear picture of the impact of that particular parameter and its value.


Amazon Kinesis Data Streams Terminology and concepts - Amazon Kinesis Data Streams

Introduction to Amazon Kinesis

 


Amazon Kinesis is a Serverless Streaming Data Service which has 4 Service types:

  • Amazon Kinesis Video Streams - to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing
  • Amazon Kinesis Data Streams - to easily stream data at any scale
  • Amazon Data Firehose - to reliably loads real-time streams into data lakes, warehouses, and analytics services
  • Amazon Managed Service for Apache Flink - to transform and analyze streaming data in real time



Friday, 8 August 2025

AWS EKS Cluster Networking





If we select a cluster and go to Networking tab, we'll see the following settings: 
  • VPC
  • Cluster IP address family
  • Service IPv4 range
  • Subnets
  • Cluster security group
  • Additional security groups
  • API server endpoint access

Manage drop down groups them into following:
  • VPC Resources (Network environment)
    • Subnets
    • Additional security groups - optional
  • Endpoint access (API server endpoint access)
  • Remote networks


We'll describe here the meaning and purpose for each of them.


VPC


Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network that you have defined. This virtual network closely resembles a traditional network that you would operate in your own data center, with the benefits of using the scalable infrastructure of AWS. 

A virtual private cloud (VPC) is a virtual network dedicated to your AWS account. A subnet is a range of IP addresses in your VPC. 

Each Managed Node Group requires you to specify one of more subnets that are defined within the VPC used by the Amazon EKS cluster. Nodes are launched into subnets that you provide. The size of your subnets determines the number of nodes and pods that you can run within them. 

You can run nodes across multiple AWS availability zones by providing multiple subnets that are each associated different availability zones. Nodes are distributed evenly across all of the designated Availability Zones

If you are using the Kubernetes Cluster Autoscaler and running stateful pods, you should create one Node Group for each availability zone using a single subnet and enable the -\-balance-similar-node-groups feature in cluster autoscaler.

EKS suggests using Private Subnets for worker nodes.




Cluster IP address family



Select the IP address type that pods and services in your cluster will receive: IPv4 or IPv6.

Amazon EKS does not support dual stack clusters. However, if you worker nodes contain an IPv4 address, Amazon EKS will configure IPv6 pod routing so that pods can communicate with cluster external IPv4 endpoints.


Service IPv4 range


The IP address range from which cluster services will receive IP addresses. Manually configuring this range can help prevent conflicts between Kubernetes services and other networks peered or connected to your VPC.

Service CIDR is only configurable when choosing IPv4 as your cluster IP address family. With IPv6, the service CIDR will be an auto generated unique local address (ULA) range.


Subnets


Choose the subnets in your VPC where the control plane may place elastic network interfaces (ENIs) to facilitate communication with your cluster. The specified subnets must span at least two availability zones.

To control exactly where the ENIs will be placed, specify only two subnets, each from a different AZ, and Amazon EKS will make cross-account ENIs in those subnets. The Amazon EKS control plane creates up to 4 cross-account ENIs in your VPC for each cluster.

You may choose one set of subnets for the control plane that are specified as part of cluster creation, and a different set of subnets for the worker nodes.

EKS suggests using private subnets for worker nodes.

If you select IPv6 cluster address family, the subnets specified as part of cluster creation must contain an IPv6 CIDR block.

Cluster security group & Additional security groups


Amazon VPC Security groups control communications within the Amazon EKS cluster including between the managed Kubernetes control plane and compute resources in your AWS account such as worker nodes and Fargate pods.

The Cluster Security Group is a unified security group that is used to control communications between the Kubernetes control plane and compute resources on the cluster. The cluster security group is applied by default to the Kubernetes control plane managed by Amazon EKS as well as any managed compute resources created by Amazon EKS. 

EKS automatically creates a cluster security group on cluster creation to facilitate communication between worker nodes and control plane. Description of such SG is: EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloadsThe name of this SG is in form: eks-cluster-sg-<cluster_name>-1234567890. It's attached to the same VPC that cluster is in. Its rules are:
  • Inbound: allow all traffic (all protocols and ports) from itself (see https://stackoverflow.com/questions/66917854/aws-security-group-source-of-inbound-rule-same-as-security-group-name)
  • Outbound: allow all IPv4 and IPv6 traffic

Optionally, choose additional security groups to apply to the EKS-managed Elastic Network Interfaces that are created in your control plane subnets. To create a new security group, go to the corresponding page in the VPC console.

Additional cluster security groups control communications from the Kubernetes control plane to compute resources in your account. Worker node security groups are security groups applied to unmanaged worker nodes that control communications from worker nodes to the Kubernetes control plane.

You can apply additional cluster security groups to control communications from the Kubernetes control plane to compute resources in your account.

Worker node security groups are security groups applied to unmanaged worker nodes that control communications from worker nodes to the Kubernetes control plane.

Example:

  • Description: EKS cluster security group
  • Inbound rules:
    • IP version: IPv4
    • Type: HTTPS 
    • Protocol: TCP
    • Port range: 443
    • Source: 192.168.1.0/24
    • Description: Office LAN CIDR (for acccess via Site-to-site VPN)



API server endpoint access


You can limit, or completely disable, public access from the internet to your Kubernetes cluster endpoint.

Amazon Amazon EKS creates an endpoint for the managed Kubernetes API server that you use to communicate with your cluster (using Kubernetes management tools such as kubectl). By default, this API server endpoint is public to the internet, and access to the API server is secured using a combination of AWS Identity and Access Management (IAM) and native Kubernetes Role Based Access Control (RBAC).

You can, optionally, limit the CIDR blocks that can access the public endpoint. If you limit access to specific CIDR blocks, then it is recommended that you also enable the private endpoint, or ensure that the CIDR blocks that you specify include the addresses that worker nodes and Fargate pods (if you use them) access the public endpoint from.

You can enable private access to the Kubernetes API server so that all communication between your worker nodes and the API server stays within your VPC. You can limit the IP addresses that can access your API server from the internet, or completely disable internet access to the API server.


Cluster endpoint access Info

Configure access to the Kubernetes API server endpoint.
  • Public - The cluster endpoint is accessible from outside of your VPC. Worker node traffic will leave your VPC to connect to the endpoint.
  • Public and private - The cluster endpoint is accessible from outside of your VPC. Worker node traffic to the endpoint will stay within your VPC.
  • Private - The cluster endpoint is only accessible through your VPC. Worker node traffic to the endpoint will stay within your VPC.
If we choose Public or Public and private we get Advanced settings with option to Add/edit sources to public access endpoint. We can add here up to 40 CIDR blocks

Public access endpoint sources - Determines the traffic that can reach the Kubernetes API endpoint of this cluster.

Use CIDR notation to specify an IP address range (for example, 203.0.113.5/32).

If connecting from behind a firewall, you'll need the IP address range used by the client computers.

By default, your public endpoint is accessible from anywhere on the internet (0.0.0.0/0).

If you restrict access to your public endpoint using CIDR blocks, it is strongly recommended to also enable private endpoint access so worker nodes and/or Fargate pods can communicate with the cluster. Without the private endpoint enabled, your public access endpoint CIDR sources must include the egress sources from your VPC. For example, if you have a worker node in a private subnet that communicates to the internet through a NAT Gateway, you will need to add the outbound IP address of the NAT Gateway as part of a allowlisted CIDR block on your public endpoint.


---

Friday, 1 August 2025

Introduction to AWS IAM Identity Center




IAM Identity Center (formerly AWS Single Sign-On, or AWS SSO) enables you to centrally manage workforce access to multiple AWS accounts and applications via single sign-on.



IAM Identity Center setup


(1) Confirm your identity source


The identity source is where you administer users and groups, and it is the service that authenticates your users. By default, IAM Identity Center creates an Identity Center directory.

(2) Manage permissions for multiple AWS accounts


Give users and groups access to specific AWS accounts in your organization.

(3) Set up application user and group assignments


Give users and groups access to specific applications configured to work with IAM Identity Center.

(4) Register a delegated administrator


Delegate the ability to manage IAM Identity Center to a member account in your AWS organization.



AWS SSO Authentication


When you run aws sso login, it initiates an authentication flow that communicates with your organization's configured IAM Identity Center instance to obtain temporary AWS credentials for CLI use. This command does not interact with the legacy AWS SSO service, but with the current IAM Identity Center (the new, official name as of July 2022). The authentication process exchanges your SSO credentials for tokens that allow you to use other AWS CLI commands with the associated permissions.

aws sso login --profile my_profile

The profile named in aws sso login --profile my_profile must be defined in your AWS CLI configuration file, specifically in ~/.aws/config (on Linux/macOS) or %USERPROFILE%.aws\config (on Windows).

To define or create an SSO profile, use the interactive command:

aws configure sso --profile my_profile

This command will prompt you for required details such as the SSO start URL, AWS region, account ID, and role name, and then write them into your ~/.aws/config file.

A typical SSO profile configuration in ~/.aws/config might look like:

[profile my_profile]
sso_session = my-sso
sso_account_id = 123456789012
sso_role_name = AdministratorAccess

[sso-session my-sso]
sso_start_url = https://myorg.awsapps.com/start
sso_region = us-east-1
sso_registration_scopes = sso:account:access


Never define the profile in ~/.aws/credentials; SSO profiles rely on ~/.aws/config.

After defining it, aws sso login --profile my_profile will use the details in ~/.aws/config to initiate login.

The most straightforward method is using aws configure sso with your desired profile name as shown above.

If you already have an SSO session defined, you can reuse it across multiple profiles by referencing the same sso_session.


IAM IC Management via Terraform


The following code shows how to use AWS Terraform Provider in order to manage Identity Center user and group. 


(1) Define SSO Permission Set (SSO Group's Permissions)


/policies/devops-combined-policy.json:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        ...
        "ecr:*",
        "eks:*",
         ...
        "lambda:*",
        ...
        "s3:*",
        "sqs:*",
         ...
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

data "aws_iam_policy_document" "devops_inline_combined_policy" {
  source_policy_documents = [
    file("${path.module}/policies/devops-combined-policy.json"),
  ]
}


This is just an example. Don't allow all permissions on particular resources, always follow the minimum permissions approach and add granular permissions, when required e.g. lambda:InvokeFunction.

SSO Permission Set resource. It also defines the length of time that the application user sessions are valid in the ISO-8601 standard.

resource "aws_ssoadmin_permission_set" "devops" {
  name             = "DevOps"
  description      = "DevOps"
  instance_arn     = tolist(data.aws_ssoadmin_instances.current.arns)[0]
  session_duration = "PT2H"
}

resource "aws_ssoadmin_permission_set_inline_policy" "devops" {
  inline_policy      = data.aws_iam_policy_document.devops_inline_combined_policy.json
  instance_arn       = aws_ssoadmin_permission_set.devops.instance_arn
  permission_set_arn = aws_ssoadmin_permission_set.devops.arn
}


(2) Define SSO Group


resource "aws_identitystore_group" "devops" {
  identity_store_id = tolist(data.aws_ssoadmin_instances.current.identity_store_ids)[0]
  display_name      = "devops"
}

resource "aws_ssoadmin_account_assignment" "devops_to_main_account" {
  instance_arn       = tolist(data.aws_ssoadmin_instances.current.arns)[0]
  permission_set_arn = aws_ssoadmin_permission_set.devops.arn

  principal_id   = aws_identitystore_group.devops.group_id
  principal_type = "GROUP"

  target_id   = "123456789012"
  target_type = "AWS_ACCOUNT"
}

(3) Define SSO User


To add a new user to Identity Store, we need to have their email which is used in SSO authentication.

resource "aws_identitystore_user" "bojan" {
  identity_store_id = tolist(data.aws_ssoadmin_instances.current.identity_store_ids)[0]

  display_name = "Bojan"
  user_name    = "Bojan"

  name {
    given_name  = "Bojan"
    family_name = "Komazec"
  }

  emails {
    value   = "bojan@example.com"
    primary = true
    type    = "work"
  }
}

resource "aws_identitystore_group_membership" "bojan_to_devops" {
  identity_store_id = tolist(data.aws_ssoadmin_instances.current.identity_store_ids)[0]
  group_id          = aws_identitystore_group.devops.group_id
  member_id         = aws_identitystore_user.bojan.user_id
}


...

Monday, 21 July 2025

AWS Site-to-Site VPN


How to Setup a VPN Connection between the office router and AWS VPN?

How to setup a IPSEC VPN Connection between our office router e.g. Cisco ASA and the AWS VPN endpoints?

AWS Virtual Private Network solutions establish secure connections between our on-premises networks, remote offices, client devices, and the AWS global network. 

AWS VPN is comprised of two services: AWS Site-to-Site VPN and AWS Client VPN. 

Each service provides a highly-available, managed, and elastic cloud VPN solution to protect our network traffic.

In this article we'll talk about AWS Site-to-Site VPN.


AWS Site-to-Site VPN 


Network diagram:


on-premise LAN: 192.168.0.0/16 
-----------------------------------------
/ \                         / \
 |                           |
 |  active tunnel            |  passive (standby) tunnel
 |                           |
\ /                         \ /
-----------------------------------------
Router1                    Router 2

VGW - Virtual Gateway 
VPC: 172.16.0.0/16; Route Table: 192.168.0.0/16 ---> VGW-xxxx


Can VPC CIDR and LAN CIDR overlap?

VPN connection consists of two tunnels:
  • active (up and running)
  • passive (down); if first one goes down, this one will take over

VPC route table will need to be modified so traffic destined for 192.168.0.0/16 to be routed to VGW-xxxx


AWS VPN service consists of 3 components:

Creating and configuring a Customer Gateway


Customer Gateway is a resource that we create in AWS that represents the a (customer) gateway device in our on-premises network.

When we create a customer gateway, we provide information about our device to AWS. We or our network administrator must configure the device to work with the site-to-site VPN connection.


We first need to create a Customer Gateway in AWS. We can do that via AWS console or Terraform provider. 



If we click on Create customer gateway, we'll see this form:



Details

  • Name tag
    • optional
    • Creates a tag with a key of 'Name' and a value that we specify.
    • Value must be 256 characters or less in length.
  • BGP ASN
    • The ASN of our customer gateway device.
    • e.g. 65000
    • Value must be in 1 - 4294967294 range.
    • The Border Gateway Protocol (BGP) Autonomous System Number (ASN) in the range of 1 – 4,294,967,294 is supported. We can use an existing public ASN assigned to our network, with the exception of the following:
      • 7224 - Reserved in all Regions
      • 9059 - Reserved in the eu-west-1 Region
      • 10124 - Reserved in the ap-northeast-1 Region
      • 17943 - Reserved in the ap-southeast-1 Region
    • If we don't have a public ASN, we can use a private ASN in the range of 64,512–65,534 or 4,200,000,000 - 4,294,967,294. The default ASN is 65000.
    • It is required if we want to set up dynamic routing. If we want to use static routing, we can use an arbitrary (default) value.
    • Where to find BGP ASN for e.g. UDM Pro?
    • If we want to use IPSec and dynamic routing, then our router device needs to support BGP over IPSec
    • When to use static and when to use dynamic routing?
  • IP address
    • Specify the IP address for our customer gateway device's external interface. This is internet-routable IP address for our gateway's external interface.
    • The address must be static and can't be behind a device performing Network Address Translation (NAT)
    • If office router is connected to ISP via e.g. WAN1 connection, this is the IP of that WAN connection 
    • Basically, this is the office's public IP address.
  • Certificate ARN
    • optional
    • The ARN of a private certificate provisioned in AWS Certificate Manager (ACM).
    • We can select certificate ARN from a drop-down list
    • How is this certificate used?
    • When to use this certificate?
  • Device
    • optional
    • A name for the customer gateway device.

Creating and configuring a Virtual private gateway


A virtual private gateway is the VPN concentrator on the Amazon side of the site-to-site VPN connection. We create a virtual private gateway and attach it to the VPC we want to use for the site-to-site VPN connection.


A VPN concentrator is a specialized networking device designed to manage numerous secure connections (VPN tunnels) for remote users or sites accessing a central network. It acts as a central point for establishing, processing, and maintaining these connections, enabling large organizations to securely connect many users simultaneously. 

Key Functions:
  • Multiple VPN Tunnel Management: VPN concentrators handle a large number of encrypted VPN tunnels simultaneously, allowing multiple users to securely connect to the network. 
  • Centralized Security: They provide a central point for managing and enforcing security policies for all remote connections, ensuring consistent protection. 
  • Scalability: VPN concentrators are designed to handle a large number of users and connections, making them suitable for large organizations with many remote workers or sites. 
  • Traffic Encryption: They encrypt all data transmitted between the remote user and the central network, ensuring secure communication and protecting sensitive information. 
  • Enhanced Security Posture: By managing and controlling all VPN connections, they help organizations maintain a strong security posture and minimize risks associated with remote access. 
How it Works:
  • 1. Remote User Connection: Remote users initiate a VPN connection, which is then routed to the VPN concentrator. 
  • 2. Authentication and Authorization: The concentrator authenticates and authorizes the user, verifying their identity and permissions. 
  • 3. Tunnel Establishment: If the user is authorized, the concentrator establishes an encrypted VPN tunnel between the user's device and the central network. 
  • 4. Secure Communication: All data transmitted through the tunnel is encrypted, protecting it from eavesdropping or interception. 
  • 5. Traffic Management: The concentrator manages and prioritizes traffic within the network, ensuring efficient and secure communication. 
Use Cases:
  • Large Enterprises: Companies with numerous remote employees often use VPN concentrators to provide secure access to their internal network. 
  • Extranet VPNs: VPN concentrators are also used in extranet setups, where multiple organizations need to securely share resources and information. 
  • Large Scale Remote Access: They are ideal for organizations that need to provide secure remote access to a large number of users from various locations. 
In essence, a VPN concentrator is a robust and scalable solution for managing secure remote access in larger organizations, providing the necessary infrastructure for secure and efficient communication across the network




If we click on Create button we'll get this form to fill:


If we select Custom ASN:



Upon creation, VGW will be in detached state. We want to attach it to VPC.
We can select to which VPC we want to attach it to.