Thursday 30 May 2024

How to monitor EC2 instance metrics in Amazon CloudWatch



Amazon CloudWatch offers 2 types of monitoring:
  • Basic
  • Detailed
    • all metrics available in 1 minute periods
    • needs explicitly to be enabled for the instance; can be enabled on an instance upon launch or after the instance is running or stopped; enabling it does not affect the monitoring of the EBS volumes attached to the instance
    • charge per metric

Amazon CloudWatch can monitor two types of EC2 instance metrics:
  • Basic (Default) Metrics
    • AWS/EC2 namespace includes the following metrics:
      • Instance metrics:
        • CPUUtilization
        • DiskReadOps
        • DiskWriteOps
        • DiskReadBytes
        • DiskWriteBytes
        • MetadataNoToken
        • MetadataNoTokenRejected
        • NetworkIn
        • NetworkOut
        • NetworkPacketsIn
        • NetworkPacketsOut
      • CPU credit metrics
        • CPUCreditUsage
        • CPUCreditBalance
        • CPUSurplusCreditBalance
        • CPUSurplusCreditsCharged
      • Dedicated Host metrics
        • DedicatedHostCPUUtilization
      • EBS metrics for Nitro-based instances
        • EBSReadOps
        • EBSWriteOps
        • EBSReadBytes
        • EBSWriteBytes
        • EBSIOBalance%
        • EBSByteBalance%
      • Status check metrics
        • StatusCheckFailed
        • StatusCheckFailed_Instance
        • StatusCheckFailed_System
        • StatusCheckFailed_AttachedEBS
    • AWS/EBS namespace includes the following status check metric
      • VolumeStalledIOCheck
    • By default, Amazon EC2 sends metric data to CloudWatch in 5-minute periods
  • Additional (Custom) Metrics 
    • internal system-level metrics
    • e.g. RAM, Instance Swap details, EBS disks utilization etc...
    • require use of one of these technologies:
      • CloudWatch Agents (recommended way). It needs to be installed on our EC2 instances, and then configured to emit selected metrics.
      • CloudWatch Monitoring Scripts (legacy way)
    • Metrics collected by the CloudWatch agent are billed as custom metrics.

CloudWatch monitoring scripts are deprecated and recommended way is to use CloudWatch agent instead of script for collecting the logs and metrics. 


How to use CloudWatch Agent for EC2 instance monitoring?


First, CloudWatch Agent needs to be installed on the EC2 instance. CloudWatch agent is available as a package in Amazon Linux 2023 and Amazon Linux 2.


# yum install amazon-cloudwatch-agent

Amazon Linux 2023 repository                                                                                                                                                                                                                   31 kB/s | 3.6 kB     00:00    
Amazon Linux 2023 Kernel Livepatch repository                                                                                                                                                                                                  38 kB/s | 2.9 kB     00:00    
Dependencies resolved.
===============================================================================================================
 Package                 Architecture            Version                 Repository                  Size
 ===============================================================================================================
Installing:
 amazon-cloudwatch-agent   x86_64          1.300033.0-1.amzn2023        amazonlinux                   95 M

Transaction Summary
===============================================================================================================
Install  1 Package

Total download size: 95 M
Installed size: 360 M
Is this ok [y/N]: y
Downloading Packages:
amazon-cloudwatch-agent-1.300033.0-1.amzn2023.x86_64.rpm                                                                                                                                                                                       71 MB/s |  95 MB     00:01    
---------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                                                          67 MB/s |  95 MB     00:01     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                                                      1/1 
  Running scriptlet: amazon-cloudwatch-agent-1.300033.0-1.amzn2023.x86_64                                                                                                                                                                                                 1/1 
create group cwagent, result: 0
create user cwagent, result: 0

  Installing       : amazon-cloudwatch-agent-1.300033.0-1.amzn2023.x86_64                                                                                                                                                                                                 1/1 
  Running scriptlet: amazon-cloudwatch-agent-1.300033.0-1.amzn2023.x86_64                                                                                                                                                                                                 1/1 
  Verifying        : amazon-cloudwatch-agent-1.300033.0-1.amzn2023.x86_64                                                                                                                                                                                                 1/1 

Installed:
  amazon-cloudwatch-agent-1.300033.0-1.amzn2023.x86_64                                                                                                                                                                                                                        
Complete!

This was done on the EC2 instance with this OS:

# cat /etc/os-release

NAME="Amazon Linux"
VERSION="2023"
ID="amzn"
ID_LIKE="fedora"
VERSION_ID="2023"
PLATFORM_ID="platform:al2023"
PRETTY_NAME="Amazon Linux 2023.4.20240429"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023"
HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/"
DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/"
SUPPORT_URL="https://aws.amazon.com/premiumsupport/"
BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023"
VENDOR_NAME="AWS"
VENDOR_URL="https://aws.amazon.com/"
SUPPORT_END="2028-03-15"

We then need to make sure that the IAM role attached to the EC2 instance has the CloudWatchAgentServerPolicy - AWS Managed Policy attached to it.



How to check if CloudWatch agent is installed on EC2 instance?



We need to SSH into EC2 instance and then run as root:

# /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status

{
  "status": "stopped",
  "starttime": "",
  "configstatus": "not configured",
  "version": "1.300033.0"
}

If service is configure and running, the output is like:

{
  "status": "running",
  "starttime": "2024-05-29T11:27:25+00:00",
  "configstatus": "configured",
  "version": "1.300033.0"
}


How to configure CloudWatch Agent?


Before running the CloudWatch agent on any server, we must create one or more CloudWatch agent configuration file(s) on the server. This configuration file define which metrics we want to be collected, on which resources and also around which descriptors (dimensions) we want to group these metrics for a visual representation.

Dimensions:
  • attributes that provide context for metrics by categorizing them according to specific criteria
  • describe and categorize
  • descriptive characteristic or attribute of data
Metrics:
  • they quantify and provide numerical details
  • assign numeric values to dimensions of our choice

Configuration file can have an arbitrary name e.g. amazon-cloudwatch-agent.json.


Example: Collecting metrics of EBS disks mounted on EC2 instance

We first need to find mounting points for root and data disks:

# lsblk

NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
nvme0n1       259:0    0   50G  0 disk 
├─nvme0n1p1   259:1    0   50G  0 part /
├─nvme0n1p127 259:2    0    1M  0 part 
└─nvme0n1p128 259:3    0   10M  0 part /boot/efi
nvme1n1       259:4    0  300G  0 disk /home/my-user/ebs-volume-1-mount-point-dir
nvme2n1       259:5    0  130G  0 disk /home/my-user/ebs-volume-2-mount-point-dir
nvme3n1       259:6    0   35G  0 disk /home/my-user/ebs-volume-3-mount-point-dir

We can then specify which metrics we want to collect on which disks (identified via their mount points) and then how to group them - in this case by InstanceId:

# vi /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "metrics": {
    "append_dimensions": {
        "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "disk": {
        "measurement": [
  "free",
  "total",
  "used",
          "used_percent"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "/",
          "/home/my-user/ebs-volume-1-mount-point-dir",
          "/home/my-user/ebs-volume-2-mount-point-dir",
          "/home/my-user/ebs-volume-3-mount-point-dir"
        ]
      }
    }
  }
}


We can create a dedicated user (e.g. cwagent) or use one of existing users (e.g. my-user) on the instance.

We can also specify a namespace but if it's not specified, a default namespace (CWAgent) is used. That namespace will appear in AWS Console in CloudWatch >> Metrics >> All Metrics >> Custom namespaces.

If we use InstanceId as a dimension and don't specify any aggregation dimensions, CloudWatch will automatically use InstanceId, path, fdisk and device. For the example above, there will be 4 graphs as we have [InstanceId, path, fdisk, device], InstanceId is always the same and there are 4 unique combinations of [path, fdisk, device].

If our instances are managed by Auto-scaling Group (ASG) and we want to monitor disks within it, we need to specify aggregation dimensions as otherwise CloudWatch can't use just AutoScalingGroupName to uniquely identify disks (as it needs to draw one graph per disk). If we have max 1 instance in ASG, we can go with additionally specifying e.g. path:

    "metrics": {
      "append_dimensions": {
        "AutoScalingGroupName":"${aws:AutoScalingGroupName}"
      },
      "aggregation_dimensions": [
        [ "AutoScalingGroupName", "path" ]
      ],
      ...
    }


If we had more than 1 instance per ASG then we need to include InstanceId and path as only combination of InstanceId + path can uniquely identify the resource:

    "metrics": {
      "append_dimensions": {
        "AutoScalingGroupName":"${aws:AutoScalingGroupName}",
        "InstanceId": "${aws:InstanceId}"
      },
      "aggregation_dimensions": [
        [ "AutoScalingGroupName", "InstanceId", "path" ]
      ],
      ...
    }



After we change the agent configuration file, we must restart the agent to have the changes take effect. To restart the agent service:

# systemctl restart amazon-cloudwatch-agent

Upon service restart we can check the local CloudWatch Agent log (on that instance):

# tail -f /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log

2024-05-13T16:26:13Z I! {"caller":"service@v0.89.0/telemetry.go:77","msg":"Skipping telemetry setup.","address":"","level":"None"}
2024-05-13T16:26:13Z I! {"caller":"service@v0.89.0/service.go:143","msg":"Starting CWAgent...","Version":"1.300033.0","NumCPU":2}
2024-05-13T16:26:13Z I! {"caller":"extensions/extensions.go:34","msg":"Starting extensions..."}
2024-05-13T16:26:13Z I! {"caller":"extensions/extensions.go:37","msg":"Extension is starting...","kind":"extension","name":"agenthealth/metrics"}
2024-05-13T16:26:13Z I! {"caller":"extensions/extensions.go:45","msg":"Extension started.","kind":"extension","name":"agenthealth/metrics"}
2024-05-13T16:26:13Z I! cloudwatch: get unique roll up list []
2024-05-13T16:26:13Z I! {"caller":"ec2tagger/ec2tagger.go:435","msg":"ec2tagger: Check EC2 Metadata.","kind":"processor","name":"ec2tagger","pipeline":"metrics/host"}
2024-05-13T16:26:13Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 15.443407438s
2024-05-13T16:26:13Z I! {"caller":"ec2tagger/ec2tagger.go:411","msg":"ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes","kind":"processor","name":"ec2tagger","pipeline":"metrics/host"}
2024-05-13T16:26:13Z I! {"caller":"service@v0.89.0/service.go:169","msg":"Everything is ready. Begin running and processing data."}


For the case where I used [ "AutoScalingGroupName", "path" ] as aggregate dimensions, CloudWatch graphs look like here:



And if we click on the aggregate AutoScalingGroupName, path:


References:




Monday 27 May 2024

AWS VPC and Terraform

 



Terraform provides two useful functions for calculating CIDR ranges for subnets: cidrsubnet and cidrsubnets.

 

cidrsubnet Function

cidrsubnet finds out the CIDR of the netnum-th subnet of prefix-defined (prefix is in form of CIDR) network by expanding its network mask by newbits bits. Here is the function's signature:

cidrsubnet(prefix, newbits, netnum)

Here are some examples:

$ terraform console

> cidrsubnet("192.168.0.0/16", 8, 0)
"192.168.0.0/24"
> cidrsubnet("192.168.0.0/16", 8, 1)
"192.168.1.0/24"
> cidrsubnet("192.168.0.0/16", 8, 2)
"192.168.2.0/24"
> cidrsubnet("192.168.0.0/16", 8, 3)

> cidrsubnet("192.168.0.0/16", 5, 0)
"192.168.0.0/21"
> cidrsubnet("192.168.0.0/16", 5, 1)
"192.168.8.0/21"
> cidrsubnet("192.168.0.0/16", 5, 2)
"192.168.16.0/21"
> cidrsubnet("192.168.0.0/16", 5, 3)
"192.168.24.0/21"

 

cidrsubnets Function

From official documentation:
 
cidrsubnets calculates a sequence of consecutive IP address ranges within a particular CIDR prefix.
 
So, the output of function is a list of IP address ranges (CIDRs) which are within a given CIDR. This sounds like a first choice tool to be used when we need to determine CIDRs for subnets of VPC defined by CIDR.
 
Here is the function's signature:
 
cidrsubnets(prefix, newbits...)

...which means that this function can take any number of newbits arguments. In fact, this number will match the number of IP ranges (subnets) we want to have. If we want to have S subnets, we'll have:

cidrsubnets(prefix, newbits1, newbits2, newbits3, ..., newbitsS)
 
newbitsN is the newbits number to be used for each IP range.
 
 
Example:
 
> cidrsubnets("192.168.0.0/16", 8, 9, 1)
tolist([
  "192.168.0.0/24",
  "192.168.1.0/25",
  "192.168.128.0/17",
])

 
So, how did we get this output?
 
We take first newbits number and add it to the original network bits (16): 16 + 8 = 24. So, the first IP range (subnet) is: 192.168.0.0/24. ipcalc shows that the maximum IP address in this range is 192.168.0.255.

$ ipcalc 192.168.0.0/24
Address:   192.168.0.0          11000000.10101000.00000000. 00000000
Netmask:   255.255.255.0 = 24   11111111.11111111.11111111. 00000000
Wildcard:  0.0.0.255            00000000.00000000.00000000. 11111111
=>
Network:   192.168.0.0/24       11000000.10101000.00000000. 00000000
HostMin:   192.168.0.1          11000000.10101000.00000000. 00000001
HostMax:   192.168.0.254        11000000.10101000.00000000. 11111110
Broadcast: 192.168.0.255        11000000.10101000.00000000. 11111111
Hosts/Net: 254                   Class C, Private Internet


So, the minimum IP address in the next IP range can be higher or equal to 192.168.1.0. 

The next IP range uses network mask 16 + 9  = 25. So this range will be 192.168.1.0/25. 

 
 $ ipcalc 192.168.1.0/25
Address:   192.168.1.0          11000000.10101000.00000001.0 0000000
Netmask:   255.255.255.128 = 25 11111111.11111111.11111111.1 0000000
Wildcard:  0.0.0.127            00000000.00000000.00000000.0 1111111
=>
Network:   192.168.1.0/25       11000000.10101000.00000001.0 0000000
HostMin:   192.168.1.1          11000000.10101000.00000001.0 0000001
HostMax:   192.168.1.126        11000000.10101000.00000001.0 1111110
Broadcast: 192.168.1.127        11000000.10101000.00000001.0 1111111
Hosts/Net: 126                   Class C, Private Internet


ipcalc shows that the max IP address in this range is 192.168.1.127 which means that next IP range can start with IP address higher or equal to 192.168.1.128.
 
Subnet mask is 16 + 1 = 17 which means 


How to get Default VPC details?



data "aws_vpc" "default" {
  default = true
}

output "default_vpc" {
  value = data.aws_vpc.default
}


terraform plan prints output vars:


Changes to Outputs:
  + defatul_vpc     = {
      + arn                                  = "arn:aws:ec2:eu-west-2:4xxxxxx18:vpc/vpc-06ecxxxxxxba9"
      + cidr_block                           = "172.31.0.0/16"
      + cidr_block_associations              = [
          + {
              + association_id = "vpc-cidr-assoc-0797xxxxxxxx87d"
              + cidr_block     = "172.31.0.0/16"
              + state          = "associated"
            },
        ]
      + default                              = true
      + dhcp_options_id                      = "dopt-0c2xxxxxx2c19"
      + enable_dns_hostnames                 = true
      + enable_dns_support                   = true
      + enable_network_address_usage_metrics = false
      + filter                               = null
      + id                                   = "vpc-06exxxxxxxba9"
      + instance_tenancy                     = "default"
      + ipv6_association_id                  = ""
      + ipv6_cidr_block                      = ""
      + main_route_table_id                  = "rtb-0axxxxx87"
      + owner_id                             = "471xxxx8"
      + state                                = null
      + tags                                 = {}
      + timeouts                             = null
    }


How to get a list of IDs of Default VPC subnets?



We can use data source defined above.

data "aws_subnets" "default" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.default.id]
  }
}

output "default_subnets" {
    value = data.aws_subnets.default
}

terraform plan output:

  + default_subnets = {
      + filter   = [
          + {
              + name   = "vpc-id"
              + values = [
                  + "vpc-06exxxxx9",
                ]
            },
        ]
      + id       = "eu-west-2"
      + ids      = [
          + "subnet-066xxxxxxxx241afc",
          + "subnet-023xxxxxx9e00c87",
          + "subnet-0571xxxxxa7fb176",
        ]
      + tags     = null
      + timeouts = null
    }

List of subnet IDs is: data.aws_subnets.default.ids

---

AWS Virtual Private Cloud (VPC)


 
 
From AWS Console:

A VPC is an isolated portion of the AWS Cloud populated by AWS objects, such as Amazon EC2 instances.

VPCs are logically isolated networks, they cannot communicate to the Internet or to each other without explicitly granting that capability => security first principle.

Each account in each region comes with default VPC. A default VPC comes with:
  • public subnet in each Availability Zone
  • internet gateway
  • settings to enable DNS resolution.
source: https://docs.aws.amazon.com/vpc/latest/userguide/default-vpc.html




VPCs are created per account, per region. VPC spans a single region.

With that region it can use all availability zones (AZ) => high availability, fault tolerance, resilience
 
For applications running on EC2 architecture starts with network.

We can connect one VPC to another, which is called peering. We can enable communication between them as if they are part of the same network.

VPC can connect to Internet or remote office, via VPN.

AWS soft limits:
  • 5 VPCs per region
  • 200 subnets per VPC
VPCs within the same region can have overlapping ranges. E.g. in eu-west-1 (Ireland) we can have two VPCs with ranges 10.3.0.0/16 and 10.3.0.0/21. The drawback of overlapping ranges is that these VPC can't be peered. 

AWS best practice recommends leverage multiple VPCs. For example, by environments: VPC for Development resources, Testing resources in another VPC, Production its own VPC. Or, per function: shared services are on their own VPC, App1 has its VPC and App2 its VPC. Division per department: Finance VPC, Engineering VPC, Business Unit X's VPC.

If you're already using multiple accounts across multiple regions, then each user in each region by default has its own VPC.

Default VPC


Finding a VPC service in AWS Console



VPC Service Dashboard



Default VPC is listed



Default VPC details

In AWS, each default VPC has CIDR 172.31.0.0/16.
 


Each VPC, and so a default one, has a main route table associated with it.



Default VPC can be deleted.

Network ACLs can be set so Default VPC is isolated (cut out).

Users can be denied access to default VPC by removing the access to it in their roles.

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Effect": "Deny",
         "Action": "ec2:RunInstances",
         "Resource": [
            "arn:aws:ec2:*:*:subnet/*"
         ],
         "Condition": {
            "StringEquals": {
               "ec2:Vpc": [
                  "arn:aws:ec2:us-east-1:123456789012:vpc/vpc-xxxxxx"
               ]
            }
         }
      }
   ]
}



Creating a VPC


From AWS console documentation:
 
You must specify an IPv4 address range when you create a VPC. Specify the IPv4 address range as a Classless Inter-Domain Routing (CIDR) block; for example, 10.0.0.0/16. You cannot specify an IPv4 CIDR block larger than /16. Optionally, you can also associate an IPv6 CIDR block with the VPC.
 Microservices, EC2 container service, clustered service and load balancers that need to communicate with  NoSQL Data Stores like Cassandra or MongoDB. In order to get access to extra layers of security with routing, we want to put these services in their own VPCs. NoSQL DBs should not be accessed from the Internet => they should be put in their own VPC without an Internet gateway. Container services and load balancer should be put in their VPCs with Internet gateway. 



Name can be e.g. microservices. IPv4 CIDR block requires some thought about:
  • ranges that we might use in other regions
  • ranges that we might use in future with on-premises or co-location
  • anticipation how big our network needs to be - we don't want to create a VPC with allocated thousands of IP addresses if we'll have several hundreds machines in it. In the same way, we don't want to allocate too few IP addresses as if we want to scale e.g. EC2 instances and load balancers in VPC network, we won't be able to launch new EC2 instances as there won't be enough IP addresses. 

if we want these networks to be able to communicate to each other either via peering or VPN then we cannot have ranges that overlap. 

Online CIDR calculators come as a handy tool for quick CIDR calculations. Example: https://www.subnet-calculator.com/cidr.php

Microservices - multiple clusters of microservices. We can say we want to have 3 different subnets of couple of hundreds of IP addresses, with each subnet not having access to the Internet. 
Multiple load balancers. Load balancers need access to the Internet. We can say we want to have 3 different subnets of couple of hundreds of IP addresses, with each subnet having access to the Internet. 
We want these clusters and load balancers to scale. 

We need ~2000 IP addresses for entire VPC.




10.0.0.0/21 

00001010.00000000.00000000.00000000 = 10.0.0.0 (min value in the range)
00001010.00000000.00000 | 000.00000000 <-- left of separator are frozen bits
00001010.00000000.00000 | 111.11111111 = 10.0.7.255 (max value in the range)

Publicly available, load balancers publicly available. 
Enable Amazon provided IPv6 CIDR blocks.

Ideally, the process of provisioning VPC should be automated (by using a Terraform for example).


VPC for NoSQL datastores. Kassandra, MongoDB, relational DBs, elasticcache hosted memcache or Redis clusters.
datastores VPC IP range and the range we chose for microservices VPC do not need to be contiguous.

HOw many nodes will be in this VPC?
3 or 4 clusters of Kassandra nodes, each with 6 nodes => we don't need large number of IP addresses 

If we want to be contiguous:
10.0.8.0/23

Cassandra uses IPv4 so we don't need IPv6 => check "No IPv6 CIDR block"


Subnets



EC2 can't be directly launched in VPC. EC2 instances are launched in subnets.

Subnets are logically dividing VPC into different ranges, for different purposes.

VPC spans the entire region e.g. VPC 10.2.0.0/16  us-west-2.
Subnet is created for particular availability zone. 

us-west-2a: subnet 10.2.0.0/24
us-west-2b: subnet 10.2.1.0/24
us-west-2c: subnet 10.2.2.0/28 <-- smaller subnet, for relational databases 

When we create a subnet, Amazon VPC will reserve 5 IP addresses out of every subnet. The fist four and the last one.
 
Example: in 10.2.0.0/24 subnet, these reserved IP addresses will be:
10.2.0.0
10.2.0.1 - reserved for VPC router
10.2.0.2
10.2.0.3
10.2.0.255 - broadcast (unused as Amazon VPC does not support broadcast)

So /24 gives 251 usable IP addresses.

Each subnet has the following properties:
  • ID
  • ARN
  • State e.g. Available
  • CIDR e.g. 172.31.16.0/20
  • Number of Available IPv4 addresses (this depends on CIDR mask e.g. if it's 20 number of available addresses is 2^(32-20) = 2^12 = 4096)
  • IPv6 CIDR
  • Availability Zone e.g. eu-west-2a (Subnets are AZ-specific!)
  • VPC that it belongs to
  • Route table
  • Network ACL
  • Default subnet (Yes/No)
  • Auto-assign public IPv4 address (Yes/No). Enable AWS to automatically assign a public IPv4 or IPv6 address to a new primary network interface for an (EC2) instance in this subnet.
    • By default, nondefault subnets have the IPv4 public addressing attribute set to false, and default subnets have this attribute set to true. An exception is a nondefault subnet created by the Amazon EC2 launch instance wizard — the wizard sets the attribute to true.
  • Auto-assign IPv6 address
  • Auto-assign customer-owned IPv4 address
  • IPv6-only (Yes/No)
  • Hostname type e.g. IP name
  • Resource name DNS A record (Enabled/Disabled)
  • Resource name DNS AAAA record (Enabled/Disabled)
  • DNS64 (Enabled/Disabled)
  • Owner (account ID)


Creating Subnets


We need to have at least one subnet in order to be able to launch EC2 instances or create elastic load balancers. 
 
VPC should be divided into subnets in order to achieve high availability, fault tolerance and resilience to the loss of a data center.
 
VPC is divided into subnets per tier (e.g. front-end, business logic, DB), per availability zone (AZ). 

Example: microservices VPC will have two tiers:
  • public for load balancers
  • private for container clusters

Default view of the Create subnet dialog


After we select the parent VPC, we get more options:





For AZ we want to choose region's a zone, in our case eu-west-1a.
 
Name subnets according to functional purpose and according to the availability 
zone that they reside in. As we're going to create a public subnet we're going to 
name this one public-a.

CIDR block of the subnet should be a subset of the CIDR block of the parent VPC. In our case parent VPC has CIDR 10.10.0.0/16 so the subnets could be e.g. 10.10.0.0/24, 10.10.1.0/24, 10.10.2.0/24 etc...If we choose 10.10.0.0/24 this will give 251 usable public IP addresses which gives enough room for load balancers to scale without running out of IP addresses. 





For the next subnet we can choose:
  • Name: public-b
  • AZ: eu-west-1b
  • CIDR block: 10.10.1.0/24 (if we want to choose the block that is following the previously chosen 10.10.0.0/24)
And for the subnet for the last AZ:
  • Name: public-c
  • AZ: eu-west-1c
  • CIDR block: 10.10.2.0/24 (if we want to choose the block that is following the previously chosen 10.10.1.0/24)
These public subnets need to be routed to the Internet. 

For our application servers, for container clusters, we want to create private subnets:
  • Name: private-cluster-[a|b|c]
  • AZ: eu-west-1[a|b|c]
  • CIDR block: 10.10.[3|4|5].0/24 (if we want to choose the block that is following the previously chosen; we use /24 assuming that 251 IP address will be enough for cluster scaling)
We created a subnet for each tier, in each AZ so in total we have 6 subnets.

To change the name of the subnet in AWS console: go to the list of subnets and click on edit icon in the Name column as in the image below:



It is not possible to change or modify the IP address range (CIDR) of an existing virtual private cloud (VPC) or subnet.

Routing


Routing is a way of allowing traffic to flow from one network to another network or even within the same network.  It's only once you have traffic routed from one place to another that you can then, filter the traffic that passes through it. 
 
Routing (IP Routing) works on network OSI layer ans therefore it works with IP addresses only. It is not concerned with protocols (e.g. TCP, UDP, ICMP...) or ports(e.g. 80, 22, 443...) but with whether or not traffic can flow from one place to another.
VPC has an implicit router, and you use route tables to control where network traffic is directed. Each subnet in your VPC must be associated with a route table, which controls the routing for the subnet (subnet route table). You can explicitly associate a subnet with a particular route table. Otherwise, the subnet is implicitly associated with the main route table. A subnet can only be associated with one route table at a time, but you can associate multiple subnets with the same subnet route table.

        (source: Configure route tables - Amazon Virtual Private Cloud)


Route table

A route table contains a set of rules, called routes, that are used to determine where network traffic from your subnet or gateway is directed.

Every VPC has to have at least one route table. Route table contains routes, sources of traffic and destinations where the traffic gets routed to. When we create VPC, it has a built-in router and a default route table (which has a default route).

Example: VPC has a range 10.10.0.0/16 and 3 subnets: 10.10.0.0/24, 10.10.1.0/24 and 10.10.2.0/28.

In this example, all route tables have a default route, in this case, this route says that all traffic destined for this particular range, 10.2/16 will remain local. 

So, in this case, when you have a range that equals the range of the VPC, then the target will be local. That essentially says, that traffic destined for another address in this same range will remain local to that VPC. That default route table is set to what we call the main route table.

All subnets have to be essentially, in some way, associated with a route table. In this case, our main route table is implicitly associated with all of our subnets and it's through that, that the VPC is able to know how to route traffic between these subnets or from the VPC to the internet or from the VPC to a VPN connection or to another VPC. So, if we don't explicitly associate the subnet with a route table, then that subnet is implicitly associated with the main route table. So, the main route table is very important to keep in mind because generally, I like to keep the main route table only for use with private routing because if I were to create a new subnet, here, then that new subnet, until it's explicitly associated with a route table it will be again, implicitly associated with the main route table. So, in that regard, I typically like for new subnets to always fall back to remaining private rather than being accessible to and from the internet. So, again, routing is the mechanism that controls the flow of traffic. It's not concerned with protocols and ports. It's only concerned with traffic flowing period. Yes, traffic can flow from here to here, from this network to that network or no, it can only flow internally. So, routing can be considered as our first line of defense. If traffic does not flow, if we don't have a pipe from our VPC to the internet, then we don't have to worry about our machines being reachable from the internet. So, again, routing is controlling the flow of traffic, all together.


Enabling Internet Access in VPC


By default, custom VPC has no access to Internet. When we create a VPC, its main route table will be created and associated with it. The only route in it will be:
 
Destination: 10.0.0.0/24       <-- VPC CIDR range
Target: local
 
In the above example "local" means the VPC router will send traffic in that cidr range to the local VPC. Specifically, it will send the traffic to the specific network interface that has the IP address specified and drop the packet if nothing in your VPC has that IP address.

Also worth noting, is the local rule can't be overridden. The VPC router will ALWAYS route local VPC traffic to the VPC (and specifically route directly to the correct interface without letting anything else in the VPC have the ability to sniff it). That rule is provided mostly as a For-your-awareness rule.
 

Internet Gateway

An internet gateway is a horizontally scaled, redundant, and highly available VPC component that enables communication between your VPC and the internet.

To use an internet gateway, attach it to your VPC and specify it as a target in your subnet route table for internet-routable IPv4 or IPv6 traffic. An internet gateway performs network address translation (NAT) for instances that have been assigned public IPv4 addresses.


An internet gateway enables resources in your public subnets (such as EC2 instances) to connect to the internet if the resource has a public IPv4 address or an IPv6 address. Similarly, resources on the internet can initiate a connection to resources in your subnet using the public IPv4 address or IPv6 address. For example, an internet gateway enables you to connect to an EC2 instance in AWS using your local computer.

An internet gateway provides a target in your VPC route tables for internet-routable traffic. For communication using IPv4, the internet gateway also performs network address translation (NAT).  



To enable Internet access, we need to create Internet Gateway and associate it with VPC. Once this is done, we need to modify main route table so it contains the following routes:


Destination: 10.0.0.0/24              <-- VPC CIDR range
Target: local 
 
Destination: 0.0.0.0/0                   <-- any IP address
Target: igw-0b8425abd94c8322f   <-- Internet Gateway id

 
If we have public subnets in our VPC, we can now associate them to this (main) route table (in its details, there is Subnet associations tab). After this, any resource (e.g. EC2 instance) running in public subnet can reach Internet. 


NAT Gateway

You can use a network address translation (NAT) gateway to enable instances in a private subnet to connect to services outside your VPC but prevent such external services from initiating a connection with those instances. There are two types of NAT gateways: public and private.

A public NAT gateway enables instances in private subnets to connect to the internet but prevents them from receiving unsolicited inbound connections from the internet. You should associate an elastic IP address with a public NAT gateway and attach an internet gateway to the VPC containing it.

A private NAT gateway enables instances in private subnets to connect to other VPCs or your on-premises network but prevents any unsolicited inbound connections from outside your VPC. You can route traffic from the NAT gateway through a transit gateway or a virtual private gatewayPrivate NAT gateway traffic can't reach the internet.

Each NAT gateway has the following properties:
  • Name 
  • Subnet in which it will be created (NAT gateway is subnet-specific)
  • Connectivity type:
    • Public
      • Elastic IP allocation ID - we need to assign an Elastic IP address to the NAT gateway. BK: Elastic IP (public IP) must be attached to the Public NAT as subnet's property Auto-assign public IPv4 address applies only to EC2 instances in that subnet so even if it's set to true, NAT won't be assigned a public IP address.
    • Private
  • Tags
Elastic IP allocation ID

Choose an Elastic IP address to assign to your NAT gateway. Only Elastic IP addresses that are not associated with any resources are listed. By default, you can have up to 2 Elastic IP addresses per public NAT gateway. You can increase the limit by requesting a quota adjustment.

To use an Elastic IP address that is currently associated with another resource, you must first disassociate the address from the resource. Otherwise, if you do not have any Elastic IP addresses you can use, allocate one to your account.

When you assign an EIP to a public NAT gateway, the network border group of the EIP must match the network border group of the Availability Zone (AZ) that you're launching the public NAT gateway into. If it's not the same, the NAT gateway will fail to launch. You can see the network border group for the subnet's AZ by viewing the details of the subnet. Similarly, you can view the network border group of an EIP by viewing the details of the EIP address.

If we want to make EC2 instances running in private subnet to be able to reach the internet, we need to: 
 
1) create NAT gateway in the public subnet, with Public connectivity
 
2) create another route table for this VPC, the one which will be associated with private subnets and which will contain the following routes:

Destination: 10.0.0.0/24               <-- VPC CIDR range
Target: local 
 
Destination: 0.0.0.0/0                   <-- any IP address
Target: nat-1a7625abd94c42aaa6  <-- NAT Gateway id

 

 
Only microservices VPC needs Internet access. 

Internet access will be enabled for public subnets that we created (public-a, public-b and public-c).

1. Create Internet Gateway

Name: microservices-internet


2. Attach gateway to VPC

Internet Gateway is not yet attached to VPC, it's in detached state initially. We need to attach it to VPC that needs to have Internet access.

 
3. Create Route Table and Routes
 
Every VPC has to have a route table and each gets automatically a default one, the main route table. Routes in it keep traffic local, only within the VPC:
 
<image here>
 
We could use the main route table to enable the route for Internet access. But this is not a good idea from a security point of view as each new subnet will implicitly fall back to the main route table which means that each new subnet by default would have the Internet access (and will be accessible from the Internet) and this is not desirable. 

We want to create a new route table.
Name: public-traffic
VPC: microservices (the one that needs Internet access)

Once created, it is not a main table and not associates with any subnets and it comes with default routes for local traffic. 

In order to enable bidierctional Internet traffic, we need to add routes to Internet Gateway (one for IPv4 and one for IPv6)

<images>
 
4. Associate new Route Table with public subnets

Associate it with all public subnets
<images>

5. EC2 Instance to get public IP address

<images>
 
 

References: