Saturday 8 June 2024

Access types in Amazon EKS

 



Types of access in EKS


Grant access to Kubernetes APIs



Our cluster has an Kubernetes API endpoint. Kubectl uses this API. We can authenticate to this API using two types of identities:
  • An AWS Identity and Access Management (IAM) principal (role or user)
  • A user in our own OpenID Connect (OIDC) provider
    • Requires authentication to our OIDC provider
    • setup: Authenticate users for your cluster from an OpenID Connect identity provider - Amazon EKS
    • We can associate one OIDC identity provider to our cluster.
    • Kubernetes doesn't provide an OIDC identity provider. We can use an existing public OIDC identity provider, or we can run our own identity provider.
    • The issuer URL of the OIDC identity provider must be publicly accessible, so that Amazon EKS can discover the signing keys. Amazon EKS doesn't support OIDC identity providers with self-signed certificates.
    • Before we can associate an OIDC identity provider with our cluster, we need the following information from our provider:
      • Issuer URL - The URL of the OIDC identity provider that allows the API server to discover public signing keys for verifying tokens. The URL must begin with https:// and should correspond to the iss claim in the provider's OIDC ID tokens. In accordance with the OIDC standard, path components are allowed but query parameters are not. Typically the URL consists of only a host name, like https://server.example.org or https://example.com. This URL should point to the level below .well-known/openid-configuration and must be publicly accessible over the internet.
      • Client ID (also known as audience) - The ID for the client application that makes authentication requests to the OIDC identity provider.
We can't disable IAM authentication to our cluster, because it's still required for joining nodes to a cluster. OIDC authentication is optional. Both can be enabled on cluster at the same time. 


IAM OIDC identity providers are entities in IAM that describe an external identity provider (IdP) service that supports the OpenID Connect (OIDC) standard, such as Google or Salesforce. You use an IAM OIDC identity provider when you want to establish trust between an OIDC-compatible IdP and your AWS account. This is useful when creating a mobile app or web application that requires access to AWS resources, but you don't want to create custom sign-in code or manage your own user identities. 

You can create and manage an IAM OIDC identity provider using the:AWS Management Console, the AWS Command Line Interface, the Tools for Windows PowerShell, or the IAM API.

After you create an IAM OIDC identity provider, you must create one or more IAM roles. A role is an identity in AWS that doesn't have its own credentials (as a user does). But in this context, a role is dynamically assigned to a federated user that is authenticated by your organization's IdP. The role permits your organization's IdP to request temporary security credentials for access to AWS. The policies assigned to the role determine what the federated users are allowed to do in AWS. 

 


Grant Kubernetes workloads access to AWS


 A workload is an application running in one or more Kubernetes pods.

A Kubernetes service account provides an identity for processes that run in a Pod.

If your Pod needs access to AWS services, you can map the service account to an IAM identity to grant that access.

Granting IAM permissions to workloads on Amazon Elastic Kubernetes Service clusters


Amazon EKS provides two ways to grant IAM permissions to workloads that run in Amazon EKS clusters:
  • IAM roles for service accounts (IRSA)
    • Allows pods to directly use IAM Roles (no need to inject into pods IAM User access credentials anymore)
    • We define the trust relationship between an IAM role and Kubernetes service account (that's a type of account in Kubernetes) in the role's trust policy.
    • Each EKS cluster has an OpenID Connect (OIDC) issuer URL associated with it. 
    • To use/enable IRSA a unique OpenID Connect provider needs to be created for each EKS cluster in IAM. 
  • EKS Pod Identities


In 2014, AWS Identity and Access Management added support for federated identities using OpenID Connect (OIDC). This feature allows you to authenticate AWS API calls with supported identity providers and receive a valid OIDC JSON web token (JWT). You can pass this token to the AWS STS AssumeRoleWithWebIdentity API operation and receive IAM temporary role credentials. You can use these credentials to interact with any AWS service, including Amazon S3 and DynamoDB.

Each JWT token is signed by a signing key pair. The keys are served on the OIDC provider managed by Amazon EKS and the private key rotates every 7 days. Amazon EKS keeps the public keys until they expire. If you connect external OIDC clients, be aware that you need to refresh the signing keys before the public key expires. 

Kubernetes has long used service accounts as its own internal identity system. Pods can authenticate with the Kubernetes API server using an auto-mounted token (which was a non-OIDC JWT) that only the Kubernetes API server could validate. These legacy service account tokens don't expire, and rotating the signing key is a difficult process. In Kubernetes version 1.12, support was added for a new ProjectedServiceAccountToken feature. This feature is an OIDC JSON web token that also contains the service account identity and supports a configurable audience.

Amazon EKS hosts a public OIDC discovery endpoint for each cluster that contains the signing keys for the ProjectedServiceAccountToken JSON web tokens so external systems, such as IAM, can validate and accept the OIDC tokens that are issued by Kubernetes.

OIDC federation access allows you to assume IAM roles via the Secure Token Service (STS), enabling authentication with an OIDC provider, receiving a JSON Web Token (JWT), which in turn can be used to assume an IAM role. Kubernetes, on the other hand, can issue so-called projected service account tokens, which happen to be valid OIDC JWTs for pods. Our setup equips each pod with a cryptographically-signed token that can be verified by STS against the OIDC provider of your choice to establish the pod’s identity.

new credential provider ”sts:AssumeRoleWithWebIdentity”


IRSA authentication
EKS OIDC IdP-signed JWT gets auto-mounted on each pod which uses service account.
AWS SDK sends AssumeRoleWithWebIdentity request containing the desired role and JWT.
STS uses IAM IdP associated to EKS OIDC IdP in order to verify identity of the pod.  


To use/enable IRSA:
  • 1) a unique OpenID Connect provider needs to be created for each EKS cluster in IAM. [Create an IAM OIDC provider for your cluster - Amazon EKS]
    • To use IAM roles for service accounts, an IAM OIDC provider must exist for your cluster's OIDC issuer URL.
    • If your cluster supports IAM roles for service accounts, it has an OpenID Connect (OIDC) issuer URL associated with it. 
    • You can view this URL in the Amazon EKS console, or you can use the following AWS CLI command to retrieve it.
$ aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text

The expected output is as follows:

https://oidc.eks.<region-code>.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E

Any OIDC provider implementation needs to have a public OIDC issuer URL (see Issuer URL in OpenID Connect Discovery should be a working URL? - Stack Overflow). So for each cluster we'll have one implementation of OIDC provider.


Your Identity Provider’s Discovery Endpoint contains important configuration information. The OIDC discovery endpoint will always end with /.well-known/openid-configuration as described in the 
OpenID Provider Configuration Request documentation.

You can confirm that the discovery endpoint is correct by entering it in a browser window. If there is a JSON object with metadata about the connection returned, the endpoint is correct.


Obtain the thumbprint for an OpenID Connect identity provider - AWS Identity and Access Management
  • 2) Configure a Kubernetes service account to assume an IAM role
  • 3) Configure Pods to use a Kubernetes service account 
  • 4) Use a supported AWS SDK 

Just like we can create OIDC Identity Provider in IAM for representing an external, 3rd party OIDC Provider so we can allow access to AWS for a user authenticated with that 3rd party OIDC Provider, we can also create OIDC Identity Provider in IAM for representing an internal, EKS OIDC Provider which is available for each cluster (each cluster has its own provider). When EKS cluster is created, its OIDC Provider is also created with two pieces of data available:
  • OIDC Provider issuer
    • has its url which is used for discovery - see the screenshot above
  • OIDC Provider server TLS certificate
    • This certificate protects the url above (OIDC Provider issuer url) and is used for clients to verify the identity of OIDC Provider server
    • TLS certificate is necessary for establishing secure communication with the OIDC provider.

In Terraform, this certificate can be obtained like here:

data "tls_certificate" "example" {
  url = aws_eks_cluster.example.identity[0].oidc[0].issuer
}

To create  OIDC Identity Provider (IdP) in IAM for this cluster-specific OIDC Provider we can use aws_iam_openid_connect_provider :


resource "aws_iam_openid_connect_provider" "example" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.example.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.example.identity[0].oidc[0].issuer
}

This resource requires few pieces of information:
  • url  - Describes which OIDC IdP this resource represents. 
    • The URL of the identity provider. Corresponds to the iss claim.
  • thumbprint_list - Describes how will clients communicate with OIDC IdP (servers). HTTP communication goes through TLS secure channel so we need to know the identity of certificates to use.
    •  A list of server certificate thumbprints for the OpenID Connect (OIDC) identity provider's server certificate(s).
    • When we create an OpenID Connect (OIDC) identity provider in IAM, IAM requires the thumbprint for the top intermediate certificate authority (CA) that signed the certificate used by the external identity provider (IdP). The thumbprint is a signature for the CA's certificate that was used to issue the certificate for the OIDC-compatible IdP. When we create an IAM OIDC identity provider, we are trusting identities authenticated by that IdP to have access to our AWS account. By using the CA's certificate thumbprint, we trust any certificate issued by that CA with the same DNS name as the one registered. This eliminates the need to update trusts in each account when we renew the IdP's signing certificate.
  • client_id_list - Describes which clients can use this OIDC IdP
    • A list of client IDs (also known as audiences). When a mobile or web app registers with an OpenID Connect provider, they establish a value that identifies the application. (This is the value that's sent as the client_id parameter on OAuth requests.)

AWS Security Token Service (STS)

  • Web service that enables you to request temporary, limited-privilege credentials for users
  • Available as a global service
  • All AWS STS requests go to a single endpoint at https://sts.amazonaws.com
  • Supports the following actions (requests):
    • AssumeRole
      • Returns a set of temporary security credentials that you can use to access AWS resources. These temporary credentials consist of an access key ID, a secret access key, and a security token. For example, user can authenticate via company's SSO and on AWS sign-on page can get these credentials that can be copied to ~/.aws/credentials under a profile and then this profile is used when accessing AWS.
    • AssumeRoleWithSAML
    • AssumeRoleWithWebIdentity
      • Issues a role session (temporary session)
      • Returns a set of temporary security credentials for users who have been authenticated in a mobile or web application with a web identity provider. Example providers include the OAuth 2.0 providers Login with Amazon and Facebook, or any OpenID Connect-compatible identity provider such as Google or Amazon Cognito federated identities.
      • Calling AssumeRoleWithWebIdentity does not require the use of AWS security credentials. Therefore, you can distribute an application (for example, on mobile devices) that requests temporary security credentials without including long-term AWS credentials in the application. You also don't need to deploy server-based proxy services that use long-term AWS credentials. Instead, the identity of the caller is validated by using a token from the web identity provider. 
      • The temporary security credentials returned by this API consist of an access key ID, a secret access key, and a security token. Applications can use these temporary security credentials to sign calls to AWS service API operations.
      • For example, user can authenticate via company's SSO and on AWS sign-on page can get these credentials that can be copied to ~/.aws/credentials under a profile and then this profile is used when accessing AWS.
      • By default, the temporary security credentials created by AssumeRoleWithWebIdentity last for one hour. However, you can use the optional DurationSeconds parameter to specify the duration of your session. You can provide a value from 900 seconds (15 minutes) up to the maximum session duration setting for the role. This setting can have a value from 1 hour to 12 hours.
      • Required parameters: 
        • RoleArn - The Amazon Resource Name (ARN) of the role that the caller is assuming.
        • RoleSessionName - An identifier for the assumed role session. Typically, you pass the name or identifier that is associated with the user who is using your application. That way, the temporary security credentials that your application will use are associated with that user. This session name is included as part of the ARN and assumed role ID in the AssumedRoleUser response element.
        • WebIdentityToken - The OAuth 2.0 access token or OpenID Connect ID token that is provided by the identity provider. Your application must get this token by authenticating the user who is using your application with a web identity provider before the application makes an AssumeRoleWithWebIdentity call. Timestamps in the token must be formatted as either an integer or a long integer. Only tokens with RSA algorithms (RS256) are supported. 
    • DecodeAuthorizationMessage
    • GetAccessKeyInfo
    • GetCallerIdentity
    • GetFederationToken
    • GetSessionToken


Example: Create an IAM role and associate it with a Kubernetes service account


Our custom service account that we have in cluster, my-service-account requires permission to e.g. launch EC2 instances. We need to assign certain IAM role to this service account (IRSA). 

We've created OIDC IdP in IAM for OIDC IdP associated with our cluster: 

arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE

Prepare the policies for the role that the IdP-authenticated user will assume. As with any role, a role for a service account includes two policies:
  • trust policy that specifies who can assume the role
  • permissions policy that specifies the AWS actions and resources that the role owner is allowed or denied access to
Trust Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:default:my-service-account",
                    "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}

Principal here is OIDC session principal which is a role session principal, see https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html#sts-session-principals.

This allows anyone who's authenticated with this OIDC IdP and who has my-service-account as the subject (user) and sts.amazonaws.com as the audience (client) in the WebIdentityToken (sent as the parameter of this request)  to be able to assume the role that has this policy attached to it.

Our service account authenticates with cluster's OIDC provider (IdP) and from it gets the token. This is Identity Token mentioned in OpenID Connect (OIDC) | My Public Notepad and it contains sub (subject identity, service account in our case) and aud (audience - client - who'll be using this token; STS in our case) claims.

It then sends AssumeRoleWithWebIdentity request with this token and role it requires (e.g. role for creating EC2 instances) to STS. STS (Client) then uses this token against EKS cluster IdP to identify user (service account) and finally to grant it a role. 

Troubleshooting



"error": "fetching instance types using ec2.DescribeInstanceTypes,
WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403

References:


Friday 7 June 2024

OpenID Connect (OIDC)


OpenID Connect (OIDC)

  • Authentication protocol
    • Authentication is a secure process of establishing and communicating that the person operating an application or browser is who they claim to be.
  • Provides a secure and verifiable answer to the question “What is the identity of the person currently using the browser or mobile app that is connected?”
  • Based on the OAuth 2.0
    • OAuth 2.0, is a framework, specified by the IETF in RFCs 6749 and 6750 (published in 2012) designed to support the development of authentication and authorization protocols. It provides a variety of standardized message flows based on JSON and HTTP; OpenID Connect uses these to provide Identity services.
  • Simplifies:
    • user identity verification
      • based on the authentication performed by an Authorization Server
    • obtaining user profile information 
      • in an interoperable and REST-like manner
  • Specification extendable to support optional features like:
    • encryption of identity data
    • discovery of OpenID Providers
    • session logout
  • Benefits for developers:
    • Easy, reliable, secure
    • Removes the responsibility of setting, storing, and managing passwords - they are stored with OpenID providers
    • There are already system-level APIs built into the Android operating system to provide OIDC services
    • OIDC can also accessed by interacting with the built-in system browser on mobile and desktop platforms; a variety of libraries are under construction to simplify this process.
    • OIDC uses standard JSON Web Token (JWT) data structures when signatures are required. This makes OpenID Connect dramatically easier to implement, and in practice has resulted in much better interoperability.

Entities in the oidc system

  • OpenID Provider (OP)
    • Entity that has implemented the OpenID Connect and OAuth 2.0 protocols
    • Sometimes can be referred to by the role it plays, such as:
      • Identity provider (IDP, IdP) - IdentityServer
      • Security token service
      • Authorization server
    • Leading IdPs are currently large cloud services providers, such as Auth0, GitHub, GitLab, Google and Microsoft
  • Identity Token
    • The outcome of an authentication process
    • After successful authentication, OP returns it to the Client
    • It can contain additional identity data but at a bare minimum it contains the following claims:
      • iss -Issuer Identifier for the Issuer of the response. The iss value is a case-sensitive URL using the https scheme that contains scheme, host, and optionally, port number and path components and no query or fragment components.
      • sub - Subject Identifier. Identifier for the user at the issuer. A locally unique and never reassigned identifier within the Issuer for the End-User, which is intended to be consumed by the Client, e.g., 24400320 or AItOawmwtWwcT0k51BayewNvutrJUqsvl6qs7A4.
      • aud - Audience(s) that this ID Token is intended for. It MUST contain the OAuth 2.0 client_id of the Relying Party as an audience value. It MAY also contain identifiers for other audiences. In the general case, the aud value is an array of case-sensitive strings. In the common special case when there is one audience, the aud value MAY be a single case-sensitive string.
      • exp - Expiration time on or after which the ID Token MUST NOT be accepted by the RP when performing authentication with the OP. The processing of this parameter requires that the current date/time MUST be before the expiration date/time listed in the value. Implementers MAY provide for some small leeway, usually no more than a few minutes, to account for clock skew. Its value is a JSON [RFC8259] number representing the number of seconds from 1970-01-01T00:00:00Z as measured in UTC until the date/time.
      • iat - Time at which the JWT was issued. Its value is a JSON number representing the number of seconds from 1970-01-01T00:00:00Z as measured in UTC until the date/time
      • The full list of claims returned within the token: https://openid.net/specs/openid-connect-core-1_0.html#IDToken
  • Access Token
    • After successful authentication, OP usually returns it to the Client
  • User
    • person that is using a registered client to access resources
  • Client
    • also known as audiences
    • software that requests tokens for:
      • authenticating a user
      • accessing a resource (also often called a relying party or RP)
    • must be registered with the OP
      • ClientID is used to identify a client app to IdP servers e.g. for Google OAuth this is in form 1234567890-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com
      • When a mobile or web app registers with an OpenID Connect provider, they establish a value that identifies the application. (This is the value that's sent as the client_id parameter on OAuth requests.)
    • can be web, mobile, desktop application
  • Relying Party (RP)
    • resource that user wants to access
    • an application or website that outsources its user authentication function to an IDP
    • application (software) that requires end-user authentication or wants to get access to the user's account. 
    • It needs to get permission from the user before it can get access to the user's account 
      • OpenID Connect identifies a set of personal attributes that can be exchanged between Identity Providers and the apps that use them and includes an approval step (aka authorization) so that users can consent (or deny) the sharing of this information.
    • OIDC Relying Party is also called just a 'client' in OAuth terminology.  (source: OIDC Relying Party)

Approval step (scope authorization) dialog looks like this:

Source: Uploading to Dropbox from Google Drive - Stack Overflow


Source: OpenID Connect  |  Authentication  |  Google for Developers


 OpenID Connect protocol steps

  • User navigates to a website or web application (RP) via a browser
  • User clicks sign-in and types their username and password
  • RP (Client) sends a (authorisation) request to the OpenID Provider (OP)
  • OP authenticates the User and obtains authorization
  • OP responds with an Identity Token and usually an Access Token
  • RP can send a request with the Access Token to the User device
  • UserInfo Endpoint returns Claims about the User

source: OpenID Connect 1.0 - Orange Developer



Source: OpenID Connect Overview: OIDC Flow | OneLogin Developers



Here is a more detailed flow diagram:

Source: OpenID Connect (OIDC) | Cloud Sundial





Source: Plan a single sign-on deployment - Microsoft Entra ID | Microsoft Learn



Resources:


Tuesday 4 June 2024

Securing AWS EKS with GuardDuty and Terraform

Amazon GuardDuty is a threat detection service which continuously monitors, profiles and analyses events across AWS accounts and resources. In Introduction to Amazon GuardDuty I wrote about its general features.




Its support for Amazon Elastic Kubernetes Service (Amazon EKS) comes in two ways:
  • GuardDuty EKS Protection (EKS Audit Log Monitoring)
    • GuardDuty feature that monitors cluster control plane activity by analyzing EKS audit logs
    • Helps detect potentially suspicious activities in EKS clusters 
    • Enabled by default when GuardDuty is enabled
  • GuardDuty EKS Runtime Monitoring
    • mechanism to detect runtime threats from over 30 security findings to protect our EKS clusters
    • can  identify specific containers within your EKS clusters that are potentially compromised and detect attempts to escalate privileges from an individual container to the underlying Amazon EC2 host and the broader AWS environment
    • fully managed EKS add-on 
    • managed as a part of GuardDuty Runtime Monitoring
    • visibility into individual container runtime activities (on-host operating system-level behavior) such as:
      • file access
      • process execution
      • network connections
    • lightweight security agent is deployed as a Kubernetes Daemonset which has a pod running on every node
    • Detection of a potential threat triggers creation of a security finding that pinpoints the specific container, and includes details such as:
      • pod ID
      • image ID
      • EKS cluster tags
      • executable path
      • process lineage
    • can be enabled both on existing and new EKS clusters 
    • available from March 2023 [Amazon GuardDuty now monitors runtime activity from containers running on Amazon EKS]
    • Needs to be enabled explicitly, after GuardDuty is enabled

In Introduction to Amazon GuardDuty we saw that to enable EKS Audit Log Monitoring we only need to provision and enable GuardDuty detector (aws_guardduty_detector | Resources | hashicorp/aws | Terraform | Terraform Registry):


resource "aws_guardduty_detector" "this" {
  enable = true
}

To enable GuardDuty EKS Runtime Monitoring (which is a prerequisite for activating GuardDuty EKS add-on) we need to use resource aws_guardduty_detector_feature.

To add and activate GuardDuty EKS add-on we have two choices:


We can enable EKS Runtime Monitoring AND automatic GuardDuty add-on management (automatic deployment and updates of the security agent in accounts where Runtime Monitoring is enabled) with the following:

resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "EKS_RUNTIME_MONITORING"
  status      = "ENABLED"

  additional_configuration {
    name   = "EKS_ADDON_MANAGEMENT"
    status = "ENABLED"
  }
}


In AWS Console we can now see that EKS Runtime monitoring is now also enabled:






EKS monitoring is now covered in full (EKS Audit Logs monitoring + EKS Runtime monitoring + automatic GuardDuty EKS addon management).

After provisioning the EKS cluster we can see that GuardDuty picked it up:



We can check out the addons section in the EKS cluster:





Amazon GuardDuty EKS runtime monitoring

Amazon GuardDuty is a security monitoring service that analyzes and processes multiple supported data sources to identify any unexpected and potentially unauthorized suspicious or malicious activity within your AWS environment.

EKS Runtime Monitoring in Amazon GuardDuty provides runtime threat detection to protect EKS clusters in your AWS environment. EKS Runtime Monitoring uses a fully-managed EKS add-on (GuardDuty security agent) that adds visibility into individual Kubernetes container runtime activities, such as file access, process execution, and network connections.

Once you enable EKS Runtime Monitoring within Amazon GuardDuty and install the EKS add-on within your cluster, GuardDuty starts to consume runtime activity events from all EC2 hosts and containers in the cluster. These events are analyzed by GuardDuty for potential threats. As GuardDuty detects a potential threat, it generates a security finding. Navigate to the GuardDuty console to view the generated findings in your account.


Disabling EKS_ADDON_MANAGEMENT will keep the GurdDuty addon active but management will switch from Automatic to Manual:

resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "EKS_RUNTIME_MONITORING"
  status      = "ENABLED"

  additional_configuration {
    name   = "EKS_ADDON_MANAGEMENT"
    status = "DISABLED"
  }
}

terraform plan output:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_guardduty_detector_feature.eks_runtime_monitoring will be updated in-place
  ~ resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
        id          = "e6c7f038a9682cf6ff6bb514c110a66f/EKS_RUNTIME_MONITORING"
        name        = "EKS_RUNTIME_MONITORING"
        # (2 unchanged attributes hidden)

      ~ additional_configuration {
            name   = "EKS_ADDON_MANAGEMENT"
          ~ status = "ENABLED" -> "DISABLED"
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.







resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "EKS_RUNTIME_MONITORING"
  status      = "ENABLED"
}

terraform plan output:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_guardduty_detector_feature.eks_runtime_monitoring must be replaced
-/+ resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
      ~ id          = "e6c7f038a9682cf6ff6bb514c110a66f/EKS_RUNTIME_MONITORING" -> (known after apply)
        name        = "EKS_RUNTIME_MONITORING"
        # (2 unchanged attributes hidden)

      - additional_configuration { # forces replacement
          - name   = "EKS_ADDON_MANAGEMENT" -> null # forces replacement
          - status = "DISABLED" -> null
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Nothing changed in the AWS Console - addon is active but manually managed.

Let's now disable EKS_RUNTIME_MONITORING feature completely:

resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "EKS_RUNTIME_MONITORING"
  status      = "DISABLED"
}

terraform plan output:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_guardduty_detector_feature.eks_runtime_monitoring must be replaced
-/+ resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
      ~ id          = "e6c7f038a9682cf6ff6bb514c110a66f/EKS_RUNTIME_MONITORING" -> (known after apply)
        name        = "EKS_RUNTIME_MONITORING"
      ~ status      = "ENABLED" -> "DISABLED"
        # (1 unchanged attribute hidden)

      - additional_configuration { # forces replacement
          - name   = "EKS_ADDON_MANAGEMENT" -> null # forces replacement
          - status = "DISABLED" -> null
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Addon is not present anymore:





Let's see what happens if we add GuardDuty addon to EKS cluster via resource aws_eks_addon.

For it we need to find out the correct name of the addon and also its version. We can use aws eks describe-addon-versions for that:

$ aws eks describe-addon-versions --output table --query 'addons[].{Name: addonName, Publisher: publisher}'
------------------------------------------------------------------------
|                         DescribeAddonVersions                        |
+---------------------------------------------+------------------------+
|                    Name                     |       Publisher        |
+---------------------------------------------+------------------------+
|  vpc-cni                                    |  eks                   |
|  upwind-security_upwind-operator            |  Upwind Security       |
|  upbound_universal-crossplane               |  upbound               |
|  tetrate-io_istio-distro                    |  tetrate-io            |
|  teleport_teleport                          |  teleport              |
|  stormforge_optimize-live                   |  StormForge            |
|  splunk_splunk-otel-collector-chart         |  Splunk                |
|  solo-io_istio-distro                       |  Solo.io               |
|  solo-io_gloo-mesh-starter-pack             |  Solo.io               |
|  solarwinds_swo-k8s-collector-addon         |  SolarWinds            |
|  snapshot-controller                        |  eks                   |
|  rafay-systems_rafay-operator               |  rafay-systems         |
|  new-relic_kubernetes-operator              |  New Relic             |
|  netapp_trident-operator                    |  NetApp Inc.           |
|  leaksignal_leakagent                       |  leaksignal            |
|  kubecost_kubecost                          |  kubecost              |
|  kube-proxy                                 |  eks                   |
|  kong_konnect-ri                            |  kong                  |
|  kasten_k10                                 |  Kasten by Veeam       |
|  haproxy-technologies_kubernetes-ingress-ee |  HAProxy Technologies  |
|  groundcover_agent                          |  groundcover           |
|  grafana-labs_kubernetes-monitoring         |  Grafana Labs          |
|  factorhouse_kpow                           |  factorhouse           |
|  eks-pod-identity-agent                     |  eks                   |
|  dynatrace_dynatrace-operator               |  dynatrace             |
|  datree_engine-pro                          |  datree                |
|  datadog_operator                           |  Datadog               |
|  cribl_cribledge                            |  Cribl                 |
|  coredns                                    |  eks                   |
|  cisco_cisco-cloud-observability-operators  |  Cisco Systems, Inc.   |
|  cisco_cisco-cloud-observability-collectors |  Cisco Systems, Inc.   |
|  calyptia_fluent-bit                        |  Calyptia Inc          |
|  aws-mountpoint-s3-csi-driver               |  s3                    |
aws-guardduty-agent                        |  eks                   |
|  aws-efs-csi-driver                         |  eks                   |
|  aws-ebs-csi-driver                         |  eks                   |
|  amazon-cloudwatch-observability            |  eks                   |
|  akuity_agent                               |  akuity                |
|  adot                                       |  eks                   |
|  accuknox_kubearmor                         |  AccuKnox              |
+---------------------------------------------+------------------------+

Now when we know the name of the addon we're interesting in (aws-guardduty-agent) we can check its latest version:

$ aws eks describe-addon-versions --output table --addon-name=aws-guardduty-agent --query 'addons[].{Name: addonName, Publisher: publisher, Version: addonVersions[0].addonVersion}'
-----------------------------------------------------------
|                  DescribeAddonVersions                  |
+----------------------+------------+---------------------+
|         Name         | Publisher  |       Version       |
+----------------------+------------+---------------------+
|  aws-guardduty-agent |  eks       |  v1.6.1-eksbuild.1  |
+----------------------+------------+---------------------+

Let's now add this resource to our eks cluster (Terraform module) and provision it:

resource "aws_eks_addon" "guardduty" {
  cluster_name                = aws_eks_cluster.this.name
  addon_name                  = "aws-guardduty-agent"
  addon_version               = "v1.6.1-eksbuild.1" 
  resolve_conflicts_on_update = "OVERWRITE"
}


terraform plan output:


Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # module.voting-app-eks-cluster.aws_eks_addon.guardduty will be created
  + resource "aws_eks_addon" "guardduty" {
      + addon_name                  = "aws-guardduty-agent"
      + addon_version               = "v1.6.1-eksbuild.1"
      + arn                         = (known after apply)
      + cluster_name                = "example-voting-app"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + resolve_conflicts_on_update = "OVERWRITE"
      + tags_all                    = (known after apply)
    }

During the provisioning, AWS Console shows the accurate status:



Provisioning the addon was not successful after 20 minutes timeout:

module.voting-app-eks-cluster.aws_eks_addon.guardduty: Still creating... [19m30s elapsed]
module.voting-app-eks-cluster.aws_eks_addon.guardduty: Still creating... [19m40s elapsed]
module.voting-app-eks-cluster.aws_eks_addon.guardduty: Still creating... [19m50s elapsed]
module.voting-app-eks-cluster.aws_eks_addon.guardduty: Still creating... [20m0s elapsed]
│ Warning: Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
│ 
│   with module.voting-app-eks-cluster.aws_eks_addon.guardduty,
│   on ../../modules/eks-cluster/main.tf line 174, in resource "aws_eks_addon" "guardduty":
│  174: resource "aws_eks_addon" "guardduty" {
│ 
│ Error: waiting for EKS Add-On (example-voting-app:aws-guardduty-agent) create: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
│ 
│   with module.voting-app-eks-cluster.aws_eks_addon.guardduty,
│   on ../../modules/eks-cluster/main.tf line 174, in resource "aws_eks_addon" "guardduty":
│  174: resource "aws_eks_addon" "guardduty" {

After this I commented out resource "aws_eks_addon" "guardduty" {...} and executed terraform apply so this resource was removed.

The reason for failing to install this add-on might be the fact that GuardDuty Runtime monitoring is not enabled for EKS. 

Let's try to manually add this add-on, without enabling the EKS Runtime monitoring:











I was not able to stop creation of this add-on in AWS Console but it worked via AWS CLI:

$ aws eks delete-addon --cluster-name example-voting-app --addon-name aws-guardduty-agent --profile terraform --region eu-west-2
{
    "addon": {
        "addonName": "aws-guardduty-agent",
        "clusterName": "example-voting-app",
        "status": "DELETING",
        "addonVersion": "v1.6.1-eksbuild.1",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:eu-west-2:471112786618:addon/example-voting-app/aws-guardduty-agent/cec7f182-1052-421b-76db-9fa8e353af80",
        "createdAt": "2024-06-04T12:30:49.852000+01:00",
        "modifiedAt": "2024-06-04T12:42:29.170000+01:00",
        "tags": {}
    }
}

When I was adding this add-on I didn't enable EKS Runtime Monitoring although its activation is required for an optimal operating experience:

Ensure to enable EKS Runtime Monitoring within Amazon GuardDuty.


Let's enable EKS Runtime Monitoring via: 

resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "EKS_RUNTIME_MONITORING"
  status      = "ENABLED"
}



After enabling  EKS Runtime Monitoring I tried to provision resource "aws_eks_addon" "guardduty" {...} and again it started creating it only to error out after 20 minutes.




Let's try to enable RUNTIME_MONITORING feature (which includes EKS_RUNTIME_MONITORING):

resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
  detector_id = aws_guardduty_detector.this.id
  name        = "RUNTIME_MONITORING"
  status      = "ENABLED"
}

We can now see in AWS Console that runtime monitoring is enabled for EKS, ECS and EC2:





Provisioning resource "aws_eks_addon" "guardduty" {...} again error out after 20 minutes.




I will try to find out what prevents successful provisioning of the aws_eks_addon resource.


After Disabling GuardDuty Detector


Resource aws_guardduty_detector_feature can be provisioned only if aws_guardduty_detector is enabled. If we try to provision it while aws_guardduty_detector is disabled, we'll get an error:

Error: updating GuardDuty Detector Feature (RUNTIME_MONITORING): BadRequestException: The request failed because you cannot enable a data source while the detector is disabled.