Friday 28 June 2024

How to fix Docker container not resolving domain names




I had a case where I run a Terraform Docker container but it would fail to reach a provider package repository.

docker-compose.yaml:

---
name: terraform

services:
  terraform:
    image:  hashicorp/terraform:latest
    volumes:
      - .:/infra
    working_dir: /infra

I wanted to execute (terraform) init command (terraform argument here refers to the name of the service, not the terraform executable itself):

$ docker compose run --rm terraform init
[+] Creating 1/1
 ✔ Network import_demo_default  Created                                                                                                                                          0.1s 
[+] Running 5/5
 ✔ terraform Pulled                                                                                                                                                             11.2s 
   ✔ ec99f8b99825 Pull complete                                                                                                                                                  4.2s 
   ✔ 47bfda048af5 Pull complete                                                                                                                                                  8.4s 
   ✔ 755b9030e6bd Pull complete                                                                                                                                                  8.4s 
   ✔ db586b81a2dc Pull complete                                                                                                                                                  9.4s 
Initializing the backend...
Initializing provider plugins...
- Finding kreuzwerker/docker versions matching "3.0.2"...
│ Error: Failed to query available provider packages
│ 
│ Could not retrieve the list of available versions for provider kreuzwerker/docker: could not connect to registry.terraform.io: failed to request discovery document: Get
│ "https://registry.terraform.io/.well-known/terraform.json": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I changed entrypoint in order to execute debugging tools. With either

entrypoint: ["wget", "https://registry.terraform.io/.well-known/terraform.json"]

or

entrypoint: ["ping", "registry.terraform.io"]


$ docker compose run --rm terraform

returned:

bad address 'registry.terraform.io'

...and

entrypoint: ["nslookup", "registry.terraform.io"]

returned:

 ;; connection timed out; no servers could be reached


To check the DNS servers used I set entrypoint to print the resolv.conf file:

entrypoint: ["cat", "/etc/resolv.conf"]

This returned:

# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 127.0.0.11
search bigcorp.com
options edns0 trust-ad ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [10.10.1.255 10.11.5.183]
# Overrides: [nameservers search]
# Option ndots from: internal


By default, Docker provides a DNS server (daemon's embedded DNS resolver) at 127.0. 0.11, so all DNS requests from containers come to it. Daemon then forwards these requests to uplink DNS servers as defined via --dns arguments, /etc/docker/daemon.json or host's  /etc/resolv.conf.

Containers use the same DNS servers as the host by default, but you can override this with --dns.

By default, containers inherit the DNS settings as defined in the /etc/resolv.conf configuration file. Containers that attach to the default bridge network receive a copy of this file. Containers that attach to a custom network use Docker's embedded DNS server. The embedded DNS server forwards external DNS lookups to the DNS servers configured on the host.
Using --dns is the same as adding dns attribute to /etc/docker/daemon.json. Same applies for --dns-search. DNS settings in /etc/docker/daemon.json will override those set in the local /etc/resolv.conf file.

My local /etc/resolv.conf file:

$ cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search Home

In my case, uplink DNS server is my local router:

$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp0s31f6)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 3 (wlp2s0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: xxxx:a25c:xxxx:0:xxxx:7aff:fe4d:3700
       DNS Servers: 192.168.0.1 xxxx:a25c:xxxx:0:xxxx:7aff:fe4d:3700
        DNS Domain: Home

Link 4 (br-a7ba833104f5)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 5 (br-d39e3c16b90f)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 6 (docker0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 7 (br-1d4f7fd2e5cc)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 8 (br-3c8c9487a095)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 9 (br-7bfedc7c4369)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 26 (veth846e490)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 27 (br-c06da6a5a65a)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 30 (br-c1e0d2aed078)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 71 (enxa44cc8e41d0f)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported


I discovered that I had DNS settings set in /etc/docker/daemon.json:

$ cat /etc/docker/daemon.json
{
  "dns": ["10.10.1.255", "10.11.5.183"],
  "dns-search": ["bigcorp.com"]
}

As there is no need to use these custom (corporate) DNS servers, I can remove these settings (basically empty) /etc/docker/daemon.json. 

To reload the new (empty) config, I had to flush changes and restart Docker:

$ sudo systemctl daemon-reload
$ sudo systemctl restart docker


Let's check how container's /etc/resolv.conf changed:

$ docker compose run --rm terraform
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 127.0.0.11
search Home
options edns0 trust-ad ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [host(127.0.0.53)]
# Overrides: []
# Option ndots from: internal

Switching entrypoint to nslookup:

entrypoint: ["nslookup", "registry.terraform.io"]

...gives now the expected result:

$ docker compose run --rm terraform
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
registry.terraform.io   canonical name = d3rdzqodp6w8cx.cloudfront.net
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:ee00:16:1aa3:1440:93a1
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:7c00:16:1aa3:1440:93a1
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:4a00:16:1aa3:1440:93a1
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:4800:16:1aa3:1440:93a1
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:8200:16:1aa3:1440:93a1
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:7200:16:1aa3:1440:93a1
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:2000:16:1aa3:1440:93a1
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 2600:9000:225d:e000:16:1aa3:1440:93a1

Non-authoritative answer:
registry.terraform.io   canonical name = d3rdzqodp6w8cx.cloudfront.net
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 143.204.68.98
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 143.204.68.95
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 143.204.68.128
Name:   d3rdzqodp6w8cx.cloudfront.net
Address: 143.204.68.94

Finally, after removing entrypoint alltogether:

$ docker compose run --rm terraform init
Initializing the backend...
Initializing provider plugins...
- Finding kreuzwerker/docker versions matching "3.0.2"...
- Installing kreuzwerker/docker v3.0.2...
- Installed kreuzwerker/docker v3.0.2 (self-signed, key ID BD080C4571C6104C)
Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

In case there were issues with any of the further uplink DNS resolvers (DNS setting on my local router, DNS issues with my Internet provider etc...) I would try using Google DNS servers directly:

$ cat /etc/docker/daemon.json
{
  "dns": ["8.8.4.4", "8.8.8.8"],
}


But for now I can keep that config file empty.


References:

Monday 17 June 2024

How to organize Terraform project




.
├── data.tf
├── files
│   ├── xyz.yaml
│   ├── xyz.json
│   ├── ...
│   └── templates
│       ├── xyz.json.tftpl
│       ├──...
│       └── xyz.yaml.tftpl
├── ...
├── ...
├── locals.tf
├── main.tf
├── outputs.tf
├── providers.tf
├── README.md
├── variables.tf
└── versions.tf




A relatively common convention:

versions.tf contains a terraform block that contains a required_providers block that specifies the providers that the module uses and the earliest version of each provider the module is known to be compatible with. This information is scoped to one module at a time, so when writing it you should only think about what the current module needs and not consider what any other modules are compatible with.

providers.tf contains one or more provider blocks that actually instantiate the providers, specifying the configuration to use for each.

Provider configurations have global scope, so the root module configures these on behalf of all downstream modules and so must consider the needs of the entire effective configuration.

If you are intending to follow that convention, then every one of your modules would have a versions.tf file, and each one should describe only what that one module needs. For example, if one of your modules uses a provider feature that is relatively new then it would probably specify a different minimum version than another module that uses provider features that have been available for a long time.

You should have providers.tf only in modules that you intend to use as root modules. The others should inherit or be passed configurations from their callers.

Some exceptions to these guidelines:
  • If you have a module that you know has been broken by a later major version of a provider and you aren’t yet ready to upgrade it, you would typically specify an upper limit on the version constraint in required_providers for that particular module, so Terraform won’t try to select a newer version that’s incompatible. 
  • Some legacy modules include provider blocks even though they aren’t root modules. I would not recommend writing any new modules like that, but it is technically still allowed – with various caveats – for backward compatibility.
When you run terraform init in a root module you will get one more file generated automatically: .terraform.lock.hcl. This file tracks Terraform’s decisions about which version of each provider to use to satisfy all of the modules’ version constraints, and so you should also add this to your version control to ensure that later provider releases with breaking changes can’t break your setup. Terraform will select a new version only if you explicitly ask it to by running terraform init -upgrade.

Here is another strong reason why NOT to put provider configuration in a non-root module: Terraform - refactoring modules: Error: Provider configuration not present - Stack Overflow


References:


Saturday 15 June 2024

Scaling in AWS EKS


 

Scaling allows the EKS cluster to dynamically adjust to varying workloads, ensuring efficient resource utilization and cost management.

Kubernetes scaling types:
  • manual
  • automatic

Manual scaling can be performed with kubectl scale command which sets a new size for a deployment, replica set, replication controller, or stateful set. 

Auto-scaling options provided in Amazon EKS:
  • native to Kubernetes
    • Cluster Autoscaler
    • Horizontal Pod Autoscaler
  • through AWS-specific features
    • Auto Scaling Groups
    • Fargate
  • 3rd party AWS-specific solutions
    • Karpenter

Kubernetes Cluster Autoscaler


The Kubernetes Cluster Autoscaler is designed to automatically adjust the number of nodes in your cluster based on the resource requests of the workloads running in the cluster.

Key Features:

  • Node Scaling: It adds or removes nodes based on the pending pods that cannot be scheduled due to insufficient resources.
  • Pod Scheduling: Ensures that all pending pods are scheduled by scaling the cluster up.

Installation and Setup:


To use the Cluster Autoscaler in the EKS cluster we need to deploy it using a Helm chart or a pre-configured YAML manifest.

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Configuration:

  • Ensure the --nodes flag in the deployment specifies the min and max nodes for your node group.
  • Annotate your node groups with the k8s.io/cluster-autoscaler tags to enable autoscaler to manage them.

Kubernetes Horizontal Pod Autoscaler (HPA)


The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on observed CPU utilization or other select metrics.

Key Features:

  • Pod Scaling: Adjusts the number of pod replicas to match the demand.

Installation and Setup:


To use HPA ensure the Metrics Server is installed in your cluster to provide resource metrics.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Configuration:


Create an HPA resource for your deployment.

kubectl autoscale deployment your-deployment --cpu-percent=50 --min=1 --max=10


AWS Autoscaling Groups


AWS Auto Scaling Groups (ASGs) can also be used to scale the worker nodes in your EKS cluster.

Key Features:

  • EC2 Instance Scaling: Automatically adjusts the number of EC2 instances in the group.

Installation and Setup:


When you create EKS managed node groups, they are automatically managed by ASGs. (This is the default auto scaling provider in EKS cluster as it does not require installing any additional tools, it's provided out of the box when we create a node group.)

eksctl create nodegroup --cluster your-cluster-name --name your-nodegroup-name --nodes-min 1 --nodes-max 10


AWS Fargate


AWS Fargate allows you to run Kubernetes pods without managing the underlying nodes. It provides serverless compute for containers, eliminating the need to provision and scale EC2 instances.

Key Features:

  • Serverless: No need to manage EC2 instances.
  • Automatic Scaling: Automatically scales pods based on the specified compute resources.

Installation and Setup:


Create a Fargate profile for your EKS cluster, specifying which pods should run on Fargate.

eksctl create fargateprofile --cluster your-cluster-name --name your-fargate-profile --namespace your-namespace

References:


Saturday 8 June 2024

Access types in Amazon EKS

 



Types of access in EKS


Grant access to Kubernetes APIs



Our cluster has an Kubernetes API endpoint. Kubectl uses this API. We can authenticate to this API using two types of identities:
  • An AWS Identity and Access Management (IAM) principal (role or user)
  • A user in our own OpenID Connect (OIDC) provider
    • Requires authentication to our OIDC provider
    • setup: Authenticate users for your cluster from an OpenID Connect identity provider - Amazon EKS
    • We can associate one OIDC identity provider to our cluster.
    • Kubernetes doesn't provide an OIDC identity provider. We can use an existing public OIDC identity provider, or we can run our own identity provider.
    • The issuer URL of the OIDC identity provider must be publicly accessible, so that Amazon EKS can discover the signing keys. Amazon EKS doesn't support OIDC identity providers with self-signed certificates.
    • Before we can associate an OIDC identity provider with our cluster, we need the following information from our provider:
      • Issuer URL - The URL of the OIDC identity provider that allows the API server to discover public signing keys for verifying tokens. The URL must begin with https:// and should correspond to the iss claim in the provider's OIDC ID tokens. In accordance with the OIDC standard, path components are allowed but query parameters are not. Typically the URL consists of only a host name, like https://server.example.org or https://example.com. This URL should point to the level below .well-known/openid-configuration and must be publicly accessible over the internet.
      • Client ID (also known as audience) - The ID for the client application that makes authentication requests to the OIDC identity provider.
We can't disable IAM authentication to our cluster, because it's still required for joining nodes to a cluster. OIDC authentication is optional. Both can be enabled on cluster at the same time. 


IAM OIDC identity providers are entities in IAM that describe an external identity provider (IdP) service that supports the OpenID Connect (OIDC) standard, such as Google or Salesforce. You use an IAM OIDC identity provider when you want to establish trust between an OIDC-compatible IdP and your AWS account. This is useful when creating a mobile app or web application that requires access to AWS resources, but you don't want to create custom sign-in code or manage your own user identities. 

You can create and manage an IAM OIDC identity provider using the:AWS Management Console, the AWS Command Line Interface, the Tools for Windows PowerShell, or the IAM API.

After you create an IAM OIDC identity provider, you must create one or more IAM roles. A role is an identity in AWS that doesn't have its own credentials (as a user does). But in this context, a role is dynamically assigned to a federated user that is authenticated by your organization's IdP. The role permits your organization's IdP to request temporary security credentials for access to AWS. The policies assigned to the role determine what the federated users are allowed to do in AWS. 

 


Grant Kubernetes workloads access to AWS


 A workload is an application running in one or more Kubernetes pods.

A Kubernetes service account provides an identity for processes that run in a Pod.

If your Pod needs access to AWS services, you can map the service account to an IAM identity to grant that access.

Granting IAM permissions to workloads on Amazon Elastic Kubernetes Service clusters


Amazon EKS provides two ways to grant IAM permissions to workloads that run in Amazon EKS clusters:
  • IAM roles for service accounts (IRSA)
    • Allows pods to directly use IAM Roles (no need to inject into pods IAM User access credentials anymore)
    • We define the trust relationship between an IAM role and Kubernetes service account (that's a type of account in Kubernetes) in the role's trust policy.
    • Each EKS cluster has an OpenID Connect (OIDC) issuer URL associated with it. 
    • To use/enable IRSA a unique OpenID Connect provider needs to be created for each EKS cluster in IAM. 
  • EKS Pod Identities


In 2014, AWS Identity and Access Management added support for federated identities using OpenID Connect (OIDC). This feature allows you to authenticate AWS API calls with supported identity providers and receive a valid OIDC JSON web token (JWT). You can pass this token to the AWS STS AssumeRoleWithWebIdentity API operation and receive IAM temporary role credentials. You can use these credentials to interact with any AWS service, including Amazon S3 and DynamoDB.

Each JWT token is signed by a signing key pair. The keys are served on the OIDC provider managed by Amazon EKS and the private key rotates every 7 days. Amazon EKS keeps the public keys until they expire. If you connect external OIDC clients, be aware that you need to refresh the signing keys before the public key expires. 

Kubernetes has long used service accounts as its own internal identity system. Pods can authenticate with the Kubernetes API server using an auto-mounted token (which was a non-OIDC JWT) that only the Kubernetes API server could validate. These legacy service account tokens don't expire, and rotating the signing key is a difficult process. In Kubernetes version 1.12, support was added for a new ProjectedServiceAccountToken feature. This feature is an OIDC JSON web token that also contains the service account identity and supports a configurable audience.

Amazon EKS hosts a public OIDC discovery endpoint for each cluster that contains the signing keys for the ProjectedServiceAccountToken JSON web tokens so external systems, such as IAM, can validate and accept the OIDC tokens that are issued by Kubernetes.

OIDC federation access allows you to assume IAM roles via the Secure Token Service (STS), enabling authentication with an OIDC provider, receiving a JSON Web Token (JWT), which in turn can be used to assume an IAM role. Kubernetes, on the other hand, can issue so-called projected service account tokens, which happen to be valid OIDC JWTs for pods. Our setup equips each pod with a cryptographically-signed token that can be verified by STS against the OIDC provider of your choice to establish the pod’s identity.

new credential provider ”sts:AssumeRoleWithWebIdentity”


IRSA authentication
EKS OIDC IdP-signed JWT gets auto-mounted on each pod which uses service account.
AWS SDK sends AssumeRoleWithWebIdentity request containing the desired role and JWT.
STS uses IAM IdP associated to EKS OIDC IdP in order to verify identity of the pod.  


To use/enable IRSA:
  • 1) a unique OpenID Connect provider needs to be created for each EKS cluster in IAM. [Create an IAM OIDC provider for your cluster - Amazon EKS]
    • To use IAM roles for service accounts, an IAM OIDC provider must exist for your cluster's OIDC issuer URL.
    • If your cluster supports IAM roles for service accounts, it has an OpenID Connect (OIDC) issuer URL associated with it. 
    • You can view this URL in the Amazon EKS console, or you can use the following AWS CLI command to retrieve it.
$ aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text

The expected output is as follows:

https://oidc.eks.<region-code>.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E

Any OIDC provider implementation needs to have a public OIDC issuer URL (see Issuer URL in OpenID Connect Discovery should be a working URL? - Stack Overflow). So for each cluster we'll have one implementation of OIDC provider.


Your Identity Provider’s Discovery Endpoint contains important configuration information. The OIDC discovery endpoint will always end with /.well-known/openid-configuration as described in the 
OpenID Provider Configuration Request documentation.

You can confirm that the discovery endpoint is correct by entering it in a browser window. If there is a JSON object with metadata about the connection returned, the endpoint is correct.


Obtain the thumbprint for an OpenID Connect identity provider - AWS Identity and Access Management
  • 2) Configure a Kubernetes service account to assume an IAM role
  • 3) Configure Pods to use a Kubernetes service account 
  • 4) Use a supported AWS SDK 

Just like we can create OIDC Identity Provider in IAM for representing an external, 3rd party OIDC Provider so we can allow access to AWS for a user authenticated with that 3rd party OIDC Provider, we can also create OIDC Identity Provider in IAM for representing an internal, EKS OIDC Provider which is available for each cluster (each cluster has its own provider). When EKS cluster is created, its OIDC Provider is also created with two pieces of data available:
  • OIDC Provider issuer
    • has its url which is used for discovery - see the screenshot above
  • OIDC Provider server TLS certificate
    • This certificate protects the url above (OIDC Provider issuer url) and is used for clients to verify the identity of OIDC Provider server
    • TLS certificate is necessary for establishing secure communication with the OIDC provider.

In Terraform, this certificate can be obtained like here:

data "tls_certificate" "example" {
  url = aws_eks_cluster.example.identity[0].oidc[0].issuer
}

To create  OIDC Identity Provider (IdP) in IAM for this cluster-specific OIDC Provider we can use aws_iam_openid_connect_provider :


resource "aws_iam_openid_connect_provider" "example" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.example.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.example.identity[0].oidc[0].issuer
}

This resource requires few pieces of information:
  • url  - Describes which OIDC IdP this resource represents. 
    • The URL of the identity provider. Corresponds to the iss claim.
  • thumbprint_list - Describes how will clients communicate with OIDC IdP (servers). HTTP communication goes through TLS secure channel so we need to know the identity of certificates to use.
    •  A list of server certificate thumbprints for the OpenID Connect (OIDC) identity provider's server certificate(s).
    • When we create an OpenID Connect (OIDC) identity provider in IAM, IAM requires the thumbprint for the top intermediate certificate authority (CA) that signed the certificate used by the external identity provider (IdP). The thumbprint is a signature for the CA's certificate that was used to issue the certificate for the OIDC-compatible IdP. When we create an IAM OIDC identity provider, we are trusting identities authenticated by that IdP to have access to our AWS account. By using the CA's certificate thumbprint, we trust any certificate issued by that CA with the same DNS name as the one registered. This eliminates the need to update trusts in each account when we renew the IdP's signing certificate.
  • client_id_list - Describes which clients can use this OIDC IdP
    • A list of client IDs (also known as audiences). When a mobile or web app registers with an OpenID Connect provider, they establish a value that identifies the application. (This is the value that's sent as the client_id parameter on OAuth requests.)

AWS Security Token Service (STS)

  • Web service that enables you to request temporary, limited-privilege credentials for users
  • Available as a global service
  • All AWS STS requests go to a single endpoint at https://sts.amazonaws.com
  • Supports the following actions (requests):
    • AssumeRole
      • Returns a set of temporary security credentials that you can use to access AWS resources. These temporary credentials consist of an access key ID, a secret access key, and a security token. For example, user can authenticate via company's SSO and on AWS sign-on page can get these credentials that can be copied to ~/.aws/credentials under a profile and then this profile is used when accessing AWS.
    • AssumeRoleWithSAML
    • AssumeRoleWithWebIdentity
      • Issues a role session (temporary session)
      • Returns a set of temporary security credentials for users who have been authenticated in a mobile or web application with a web identity provider. Example providers include the OAuth 2.0 providers Login with Amazon and Facebook, or any OpenID Connect-compatible identity provider such as Google or Amazon Cognito federated identities.
      • Calling AssumeRoleWithWebIdentity does not require the use of AWS security credentials. Therefore, you can distribute an application (for example, on mobile devices) that requests temporary security credentials without including long-term AWS credentials in the application. You also don't need to deploy server-based proxy services that use long-term AWS credentials. Instead, the identity of the caller is validated by using a token from the web identity provider. 
      • The temporary security credentials returned by this API consist of an access key ID, a secret access key, and a security token. Applications can use these temporary security credentials to sign calls to AWS service API operations.
      • For example, user can authenticate via company's SSO and on AWS sign-on page can get these credentials that can be copied to ~/.aws/credentials under a profile and then this profile is used when accessing AWS.
      • By default, the temporary security credentials created by AssumeRoleWithWebIdentity last for one hour. However, you can use the optional DurationSeconds parameter to specify the duration of your session. You can provide a value from 900 seconds (15 minutes) up to the maximum session duration setting for the role. This setting can have a value from 1 hour to 12 hours.
      • Required parameters: 
        • RoleArn - The Amazon Resource Name (ARN) of the role that the caller is assuming.
        • RoleSessionName - An identifier for the assumed role session. Typically, you pass the name or identifier that is associated with the user who is using your application. That way, the temporary security credentials that your application will use are associated with that user. This session name is included as part of the ARN and assumed role ID in the AssumedRoleUser response element.
        • WebIdentityToken - The OAuth 2.0 access token or OpenID Connect ID token that is provided by the identity provider. Your application must get this token by authenticating the user who is using your application with a web identity provider before the application makes an AssumeRoleWithWebIdentity call. Timestamps in the token must be formatted as either an integer or a long integer. Only tokens with RSA algorithms (RS256) are supported. 
    • DecodeAuthorizationMessage
    • GetAccessKeyInfo
    • GetCallerIdentity
    • GetFederationToken
    • GetSessionToken


Example: Create an IAM role and associate it with a Kubernetes service account


Our custom service account that we have in cluster, my-service-account requires permission to e.g. launch EC2 instances. We need to assign certain IAM role to this service account (IRSA). 

We've created OIDC IdP in IAM for OIDC IdP associated with our cluster: 

arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE

Prepare the policies for the role that the IdP-authenticated user will assume. As with any role, a role for a service account includes two policies:
  • trust policy that specifies who can assume the role
  • permissions policy that specifies the AWS actions and resources that the role owner is allowed or denied access to
Trust Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:default:my-service-account",
                    "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}

Principal here is OIDC session principal which is a role session principal, see https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html#sts-session-principals.

This allows anyone who's authenticated with this OIDC IdP and who has my-service-account as the subject (user) and sts.amazonaws.com as the audience (client) in the WebIdentityToken (sent as the parameter of this request)  to be able to assume the role that has this policy attached to it.

Our service account authenticates with cluster's OIDC provider (IdP) and from it gets the token. This is Identity Token mentioned in OpenID Connect (OIDC) | My Public Notepad and it contains sub (subject identity, service account in our case) and aud (audience - client - who'll be using this token; STS in our case) claims.

It then sends AssumeRoleWithWebIdentity request with this token and role it requires (e.g. role for creating EC2 instances) to STS. STS (Client) then uses this token against EKS cluster IdP to identify user (service account) and finally to grant it a role. 

Troubleshooting



"error": "fetching instance types using ec2.DescribeInstanceTypes,
WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403

References:


Friday 7 June 2024

OpenID Connect (OIDC)


OpenID Connect (OIDC)

  • Authentication protocol
    • Authentication is a secure process of establishing and communicating that the person operating an application or browser is who they claim to be.
  • Provides a secure and verifiable answer to the question “What is the identity of the person currently using the browser or mobile app that is connected?”
  • Based on the OAuth 2.0
    • OAuth 2.0, is a framework, specified by the IETF in RFCs 6749 and 6750 (published in 2012) designed to support the development of authentication and authorization protocols. It provides a variety of standardized message flows based on JSON and HTTP; OpenID Connect uses these to provide Identity services.
  • Simplifies:
    • user identity verification
      • based on the authentication performed by an Authorization Server
    • obtaining user profile information 
      • in an interoperable and REST-like manner
  • Specification extendable to support optional features like:
    • encryption of identity data
    • discovery of OpenID Providers
    • session logout
  • Benefits for developers:
    • Easy, reliable, secure
    • Removes the responsibility of setting, storing, and managing passwords - they are stored with OpenID providers
    • There are already system-level APIs built into the Android operating system to provide OIDC services
    • OIDC can also accessed by interacting with the built-in system browser on mobile and desktop platforms; a variety of libraries are under construction to simplify this process.
    • OIDC uses standard JSON Web Token (JWT) data structures when signatures are required. This makes OpenID Connect dramatically easier to implement, and in practice has resulted in much better interoperability.

Entities in the oidc system

  • OpenID Provider (OP)
    • Entity that has implemented the OpenID Connect and OAuth 2.0 protocols
    • Sometimes can be referred to by the role it plays, such as:
      • Identity provider (IDP, IdP) - IdentityServer
      • Security token service
      • Authorization server
    • Leading IdPs are currently large cloud services providers, such as Auth0, GitHub, GitLab, Google and Microsoft
  • Identity Token
    • The outcome of an authentication process
    • After successful authentication, OP returns it to the Client
    • It can contain additional identity data but at a bare minimum it contains the following claims:
      • iss -Issuer Identifier for the Issuer of the response. The iss value is a case-sensitive URL using the https scheme that contains scheme, host, and optionally, port number and path components and no query or fragment components.
      • sub - Subject Identifier. Identifier for the user at the issuer. A locally unique and never reassigned identifier within the Issuer for the End-User, which is intended to be consumed by the Client, e.g., 24400320 or AItOawmwtWwcT0k51BayewNvutrJUqsvl6qs7A4.
      • aud - Audience(s) that this ID Token is intended for. It MUST contain the OAuth 2.0 client_id of the Relying Party as an audience value. It MAY also contain identifiers for other audiences. In the general case, the aud value is an array of case-sensitive strings. In the common special case when there is one audience, the aud value MAY be a single case-sensitive string.
      • exp - Expiration time on or after which the ID Token MUST NOT be accepted by the RP when performing authentication with the OP. The processing of this parameter requires that the current date/time MUST be before the expiration date/time listed in the value. Implementers MAY provide for some small leeway, usually no more than a few minutes, to account for clock skew. Its value is a JSON [RFC8259] number representing the number of seconds from 1970-01-01T00:00:00Z as measured in UTC until the date/time.
      • iat - Time at which the JWT was issued. Its value is a JSON number representing the number of seconds from 1970-01-01T00:00:00Z as measured in UTC until the date/time
      • The full list of claims returned within the token: https://openid.net/specs/openid-connect-core-1_0.html#IDToken
  • Access Token
    • After successful authentication, OP usually returns it to the Client
  • User
    • person that is using a registered client to access resources
  • Client
    • also known as audiences
    • software that requests tokens for:
      • authenticating a user
      • accessing a resource (also often called a relying party or RP)
    • must be registered with the OP
      • ClientID is used to identify a client app to IdP servers e.g. for Google OAuth this is in form 1234567890-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com
      • When a mobile or web app registers with an OpenID Connect provider, they establish a value that identifies the application. (This is the value that's sent as the client_id parameter on OAuth requests.)
    • can be web, mobile, desktop application
  • Relying Party (RP)
    • resource that user wants to access
    • an application or website that outsources its user authentication function to an IDP
    • application (software) that requires end-user authentication or wants to get access to the user's account. 
    • It needs to get permission from the user before it can get access to the user's account 
      • OpenID Connect identifies a set of personal attributes that can be exchanged between Identity Providers and the apps that use them and includes an approval step (aka authorization) so that users can consent (or deny) the sharing of this information.
    • OIDC Relying Party is also called just a 'client' in OAuth terminology.  (source: OIDC Relying Party)

Approval step (scope authorization) dialog looks like this:

Source: Uploading to Dropbox from Google Drive - Stack Overflow


Source: OpenID Connect  |  Authentication  |  Google for Developers


 OpenID Connect protocol steps

  • User navigates to a website or web application (RP) via a browser
  • User clicks sign-in and types their username and password
  • RP (Client) sends a (authorisation) request to the OpenID Provider (OP)
  • OP authenticates the User and obtains authorization
  • OP responds with an Identity Token and usually an Access Token
  • RP can send a request with the Access Token to the User device
  • UserInfo Endpoint returns Claims about the User

source: OpenID Connect 1.0 - Orange Developer



Source: OpenID Connect Overview: OIDC Flow | OneLogin Developers



Here is a more detailed flow diagram:

Source: OpenID Connect (OIDC) | Cloud Sundial





Source: Plan a single sign-on deployment - Microsoft Entra ID | Microsoft Learn



Resources: