Monday, 13 October 2025
AWS VPC Endpoint
Policy types in AWS
Main AWS Policy Types
Resource-based policies are attached directly to AWS resources (such as S3 buckets or SNS topics), specifying which principals (identities or accounts) can access those resources and what actions are permitted.
Other AWS Policy Types
- Managed policies (AWS managed and customer managed)
- Inline policies (directly embedded on a single identity)
- Permissions boundaries (set maximum permissions for identities)
- Service Control Policies (SCPs, used in AWS Organizations)
- Access Control Lists (ACLs, primarily for resources like S3 buckets)
- Session policies (restrict permissions for sessions created with temporary credentials).
How do resource-based and identity-based policies differ?
Key Differences
Attachment and Usage
Policy Evaluation
If resource based policy allows access to some user, do we need a separate identity-based policy which allows access to that resource to be attached to that user?
Details on Policy Evaluation Logic
Are policies listed in IAM in AWS Console, only identity-based policies?
Tuesday, 7 October 2025
Amazon RDS (Relational Database Service)
Amazon Relational Database Service (RDS)
- Distributed relational database service
- Simplifies the setup, operation, and scaling of a relational database
- Automates admin tasks like patching the database software, backing up databases and enabling point-in-time recovery
- Scaling storage and compute resources is done via API call
- Oracle (proprietary)
- Microsoft SQL Server (proprietary)
- IBM Db2 (community-developed)
- Amazon Aurora (MySQL- and PostgreSQL-compatible)(open-source)
- MySQL
- MariaDB
- PostgreSQL
Networking
We can launch Amazon RDS databases in the public or private subnet of a VPC.
If DB instance is in a public subnet and we want it to be accessible from Internet:
- Publicly Accessible property of the DB instance needs to be set to Yes
- Inbound rules for the security group of the RDS instance need to allow connections from source IP
- Internet Gateway needs to be attached to VPC
Troubleshooting
MySQL
Terraform
Resources:
Friday, 5 September 2025
Introduction to Amazon Kinesis Data Streams
Amazon Kinesis Data Streams is one of 4 Amazon Kinesis services. It helps to easily stream data at any scale.
- There are no servers to manage
- Two capacity modes:
- on-demand: eliminates the need to provision or manage capacity required for running applications; Automatic provisioning and scaling
- provisioned
- Pay only for what you use
- Built-in integrations with other AWS services to create analytics, serverless, and application integration solutions
- To ingest and collect terabytes of data per day from application and service logs, clickstream data, sensor data, and in-app user events to power live dashboards, generate metrics, and deliver data into data lakes
- To build applications for high-frequency event data such as clickstream data, and gain access to insights in seconds, not days, using AWS Lambda or Amazon Managed Service for Apache Flink
- To pair with AWS Lambda to respond to or adjust immediate occurrences within the event-driven applications in your environment, at any scale.
Capacity Modes
- Provisioned
- data stream capacity is fixed
- 1 shard has fixed capacities:
- Write: Maximum 1 MiB/second 1,000 records/second
- Read: Maximum 2 MiB/second
- N shards will multiply R/W capacity by N
- On-demand
- data stream capacity scales automatically
Data Retention Period
- Sequence number
- within the shard
- assigned by Kinesis Data Streams
- Partition key
- used to isolate and route records to different shards of a data stream
- specified by your data producer while adding data to a Kinesis data stream. For example, let’s say you have a data stream with two shards (shard 1 and shard 2). You can configure your data producer to use two partition keys (key A and key B) so that all records with key A are added to shard 1 and all records with key B are added to shard 2.
- Data blob
- an immutable sequence of bytes
- data of interest your data producer adds to a data stream
- Kinesis Data Streams does not inspect, interpret, or change the data in the blob in any way
- A data blob (the data payload before Base64-encoding) can be up to 1 MB.
How to find all shards in a stream?
Shard Iterator
- AT_SEQUENCE_NUMBER: Reads from the specific sequence number.
- AFTER_SEQUENCE_NUMBER: Reads just after the specified sequence number.
- TRIM_HORIZON: Starts from the oldest available data in the shard.
- LATEST: Starts from the newest record (most recently added record, records added after the iterator is created)
- AT_TIMESTAMP: Starts from a specific timestamp in the shard
How to get an iterator in a shard?
AAAAAAAAAAHf65JkbV8ZJQ...Exsy8WerU5Z8LKI8wtXm95+blpXljd0UWgDs7Seo9QlpikJI/6U=
% aws kinesis get-records \
"Records": [
{
"SequenceNumber": "49666716197061357389751170868210642185623031110770360322",
"ApproximateArrivalTimestamp": "2025-09-03T17:37:00.343000+01:00",
"Data": "H4sIAAAAAAAA/+3YTW/TMBgH8K8S...KaevxIQAA",
"PartitionKey": "54dc991cd70ae6242c35f01972968478"
},
{
"SequenceNumber": "49666716197061357389751170868211851111442645739945066498",
"ApproximateArrivalTimestamp": "2025-09-03T17:37:00.343000+01:00",
"Data": "H4sIAAAAAAAA/+3YS2vcM...889uPBV8BVXceAAA=",
"PartitionKey": "e4e9ad254c154281a67d05a33fa0ea31"
},
Why each record read from the shard has a different Partition Key?
Each record in a Kinesis stream can have a different PartitionKey because the PartitionKey is chosen by the data producer for each record and is not tied directly to the way records are read from the stream or which iterator type is used. Even when reading from a single shard using TRIM_HORIZON, records within that shard may have different PartitionKeys because the PartitionKey is used to route records to shards at the time of writing—not to group records within a shard during reading.
How Partition Keys and Shards Work
Reading and PartitionKeys
Distribution Across Shards
If the producer uses the same PartitionKey for all records, then all its records will go to the same shard, preserving strict ordering for that key within the shard.
In summary, unless a producer always uses the same PartitionKey, its records may spread across shards, and any batch read from a shard iterator will simply reflect the ordering of records within that shard, including records from multiple producers and PartitionKeys.
Metrics to Observe
Basic and enhanced CloudWatch Metrics
- Basic (stream-level) metrics – Stream-level data is sent automatically every minute at no charge.
- Enhanced (shard-level) metrics – Shard-level data is sent every minute for an additional cost.
To prevent this from happening we can monitor some stream metrics and also set alarms when they reach critical thresholds.
GetRecords.IteratorAgeMilliseconds
It measures how old the oldest record returned by GetRecords is (how far our consumer lags). A very large value → our consumer(s) aren’t keeping up with the incoming write rate.
The age of the last record in all GetRecords calls made against a Kinesis stream, measured over the specified time period. Age is the difference between the current time and when the last record of the GetRecords call was written to the stream. The Minimum and Maximum statistics can be used to track the progress of Kinesis consumer applications. A value of zero indicates that the records being read are completely caught up with the stream. Shard-level metric name: IteratorAgeMilliseconds.
Meaningful Statistics: Minimum, Maximum, Average, Samples
Unit info: Milliseconds
There are 86,400,000 milliseconds in a day so if the reading of this metric goes above it, that means that some records will be lost.
Iterator-age number is a classic “consumer is falling behind” symptom.
IteratorAgeMilliseconds = 86.4M → very high, backlog building
GetRecords.Bytes
The number of bytes retrieved from the Kinesis stream, measured over the specified time period. Minimum, Maximum, and Average statistics represent the bytes in a single GetRecords operation for the stream in the specified time period.
Shard-level metric name: OutgoingBytes
Meaningful Statistics: Minimum, Maximum, Average, Sum, Samples
Unit info: Bytes
- GetRecords.Records decreasing → Lambda is lagging
- GetRecords.Success decreasing → Lambda is not keeping up
Addressing Bottlenecks
--stream-name your-stream-name \
--target-shard-count 4 \
--scaling-type UNIFORM_SCALING
- batch size: 100 --> 400
- concurrent batches per shard: 1 --> 5
- Activate trigger: Yes
- Batch size: 400
- Batch window: None
- Concurrent batches per shard: 5
- Event source mapping ARN: arn:aws:lambda:us-east-2:123456789012:event-source-mapping:80bd81a9-c175-4af5-9aa9-8926b0587f40
- Last processing result: OK
- Maximum age of record: -1
- Metrics: None
- On-failure destination: None
- Report batch item failures: No
- Retry attempts: -1
- Split batch on error: No
- Starting position: TRIM_HORIZON
- Tags: View
- Tumbling window duration: None
- UUID: 80bd81a9-c175-4af5-9aa9-8926b0587f40
- Each shard can have multiple batches being processed at the same time.
- Default is 1, meaning: the next batch from a shard won’t be sent to Lambda until the previous batch finishes.
- If your Lambda is slow or variable in duration, this can create a backlog because only one batch per shard is in flight at a time.
- If IteratorAgeMilliseconds is very high → Lambda cannot keep up with the stream.
- If Lambda execution duration is variable → a single batch in flight per shard limits throughput.
- If you have sufficient Lambda concurrency (which you do — up to 15) → you can safely allow multiple batches per shard.
- Start with 2–5 concurrent batches per shard
- This allows Kinesis to send multiple batches from the same shard to Lambda simultaneously.
- Observe if IteratorAgeMilliseconds decreases.
- Monitor Lambda throttles and duration
- Ensure your Lambda’s memory/cpu and timeout can handle multiple concurrent batches.
- if no throttling currently → room to increase concurrency.
- Adjust batch size if needed
- Larger batch sizes may help throughput, but smaller batch sizes + higher concurrency often reduce latency.
- Increasing concurrency per shard does not increase shard limits; it just allows more parallelism per shard.
- If Lambda fails batches, retries are per batch → more concurrency increases the number of batches being retried simultaneously.
- Set Concurrent batches per shard = 5
- Keep batch size = 400
- Monitor:
- IteratorAgeMilliseconds
- GetRecords.Records
- Lambda duration / concurrency
- Adjust up/down based on observed backlog.
Amazon Kinesis Data Streams Terminology and concepts - Amazon Kinesis Data Streams
Introduction to Amazon Kinesis
Amazon Kinesis is a Serverless Streaming Data Service which has 4 Service types:
- Amazon Kinesis Video Streams - to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing
- Amazon Kinesis Data Streams - to easily stream data at any scale
- Amazon Data Firehose - to reliably loads real-time streams into data lakes, warehouses, and analytics services
- Amazon Managed Service for Apache Flink - to transform and analyze streaming data in real time
Friday, 8 August 2025
AWS EKS Cluster Networking
- VPC
- Cluster IP address family
- Service IPv4 range
- Subnets
- Cluster security group
- Additional security groups
- API server endpoint access
- VPC Resources (Network environment)
- Subnets
- Additional security groups - optional
- Endpoint access (API server endpoint access)
- Remote networks
VPC
Cluster IP address family
Service IPv4 range
Subnets
Cluster security group & Additional security groups
- Inbound: allow all traffic (all protocols and ports) from itself (see https://stackoverflow.com/questions/66917854/aws-security-group-source-of-inbound-rule-same-as-security-group-name)
- Outbound: allow all IPv4 and IPv6 traffic
- Description: EKS cluster security group
- Inbound rules:
- IP version: IPv4
- Type: HTTPS
- Protocol: TCP
- Port range: 443
- Source: 192.168.1.0/24
- Description: Office LAN CIDR (for acccess via Site-to-site VPN)
API server endpoint access
- Public - The cluster endpoint is accessible from outside of your VPC. Worker node traffic will leave your VPC to connect to the endpoint.
- Public and private - The cluster endpoint is accessible from outside of your VPC. Worker node traffic to the endpoint will stay within your VPC.
- Private - The cluster endpoint is only accessible through your VPC. Worker node traffic to the endpoint will stay within your VPC.
Friday, 1 August 2025
Introduction to AWS IAM Identity Center
IAM Identity Center setup
(1) Confirm your identity source
(2) Manage permissions for multiple AWS accounts
(3) Set up application user and group assignments
(4) Register a delegated administrator
AWS SSO Authentication
IAM IC Management via Terraform
(1) Define SSO Permission Set (SSO Group's Permissions)
(2) Define SSO Group
(3) Define SSO User
Monday, 21 July 2025
AWS Site-to-Site VPN
AWS Site-to-Site VPN
- active (up and running)
- passive (down); if first one goes down, this one will take over
- Customer Gateway
- Customer side of the connection e.g. Cisco ASA
- aws_customer_gateway | Resources | hashicorp/aws | Terraform | Terraform Registry
- Virtual Private Gateway
- Router on the AWS side of the VPN tunnel
- aws_vpn_gateway | Resources | hashicorp/aws | Terraform | Terraform Registry
- VPN connection itself
- Bundles together info about above two
- aws_vpn_connection | Resources | hashicorp/aws | Terraform | Terraform Registry
Creating and configuring a Customer Gateway
Details
- Name tag
- optional
- Creates a tag with a key of 'Name' and a value that we specify.
- Value must be 256 characters or less in length.
- BGP ASN
- The ASN of our customer gateway device.
- e.g. 65000
- Value must be in 1 - 4294967294 range.
- The Border Gateway Protocol (BGP) Autonomous System Number (ASN) in the range of 1 – 4,294,967,294 is supported. We can use an existing public ASN assigned to our network, with the exception of the following:
- 7224 - Reserved in all Regions
- 9059 - Reserved in the eu-west-1 Region
- 10124 - Reserved in the ap-northeast-1 Region
- 17943 - Reserved in the ap-southeast-1 Region
- If we don't have a public ASN, we can use a private ASN in the range of 64,512–65,534 or 4,200,000,000 - 4,294,967,294. The default ASN is 65000.
- It is required if we want to set up dynamic routing. If we want to use static routing, we can use an arbitrary (default) value.
- Where to find BGP ASN for e.g. UDM Pro?
- If we want to use IPSec and dynamic routing, then our router device needs to support BGP over IPSec
- When to use static and when to use dynamic routing?
- IP address
- Specify the IP address for our customer gateway device's external interface. This is internet-routable IP address for our gateway's external interface.
- The address must be static and can't be behind a device performing Network Address Translation (NAT)
- If office router is connected to ISP via e.g. WAN1 connection, this is the IP of that WAN connection
- Basically, this is the office's public IP address.
- Certificate ARN
- optional
- The ARN of a private certificate provisioned in AWS Certificate Manager (ACM).
- We can select certificate ARN from a drop-down list
- How is this certificate used?
- When to use this certificate?
- Device
- optional
- A name for the customer gateway device.
Creating and configuring a Virtual private gateway
A VPN concentrator is a specialized networking device designed to manage numerous secure connections (VPN tunnels) for remote users or sites accessing a central network. It acts as a central point for establishing, processing, and maintaining these connections, enabling large organizations to securely connect many users simultaneously.Key Functions:
- Multiple VPN Tunnel Management: VPN concentrators handle a large number of encrypted VPN tunnels simultaneously, allowing multiple users to securely connect to the network.
- Centralized Security: They provide a central point for managing and enforcing security policies for all remote connections, ensuring consistent protection.
- Scalability: VPN concentrators are designed to handle a large number of users and connections, making them suitable for large organizations with many remote workers or sites.
- Traffic Encryption: They encrypt all data transmitted between the remote user and the central network, ensuring secure communication and protecting sensitive information.
- Enhanced Security Posture: By managing and controlling all VPN connections, they help organizations maintain a strong security posture and minimize risks associated with remote access.
How it Works:
- 1. Remote User Connection: Remote users initiate a VPN connection, which is then routed to the VPN concentrator.
- 2. Authentication and Authorization: The concentrator authenticates and authorizes the user, verifying their identity and permissions.
- 3. Tunnel Establishment: If the user is authorized, the concentrator establishes an encrypted VPN tunnel between the user's device and the central network.
- 4. Secure Communication: All data transmitted through the tunnel is encrypted, protecting it from eavesdropping or interception.
- 5. Traffic Management: The concentrator manages and prioritizes traffic within the network, ensuring efficient and secure communication.
Use Cases:
- Large Enterprises: Companies with numerous remote employees often use VPN concentrators to provide secure access to their internal network.
- Extranet VPNs: VPN concentrators are also used in extranet setups, where multiple organizations need to securely share resources and information.
- Large Scale Remote Access: They are ideal for organizations that need to provide secure remote access to a large number of users from various locations.
In essence, a VPN concentrator is a robust and scalable solution for managing secure remote access in larger organizations, providing the necessary infrastructure for secure and efficient communication across the network