Thursday, 26 February 2026

Introduction to KubePug (Kubernetes tool)



What is KubePug?


KubePug (Kubernetes PreUpGrade Checker) is an open-source kubectl plugin and CLI tool designed to identify deprecated or deleted APIs in your Kubernetes cluster or manifest files before you perform an upgrade. 

Key Features


  • Deprecation Detection: Scans your live cluster or static YAML manifests to find resources using APIs that are slated for removal in future Kubernetes versions.
  • Replacement Guidance: Not only flags outdated APIs but also suggests the recommended replacement API and specifies the exact version where the deprecation or deletion occurs.
  • Version Targeting: Allows you to specify a target Kubernetes version (e.g., v1.31) to validate your current resources against that specific future release.
  • Flexible Data Source: It automatically downloads a frequently updated API definition file (every 30 minutes) to stay current with the latest Kubernetes releases. 

Why Use It?


As Kubernetes evolves, APIs are moved from alpha/beta to stable (GA), and older versions are eventually removed. If you upgrade your cluster without updating your manifests, those resources will fail to deploy or operate. KubePug provides a "pre-flight" check to prevent these breaking changes from reaching production.


Installation & Usage


You can install KubePug via Krew (where it is listed under the name deprecations) or as a standalone binary. 

Method                  Command
---------                    -------------
Install via Krew kubectl krew install deprecations
Scan Live Cluster kubectl deprecations --k8s-version=v1.30
Scan Manifest File kubepug --input-file=./my-manifest.yaml

Installation via Krew


% kubectl krew install deprecations
Updated the local copy of plugin index.
Installing plugin: deprecations
Installed plugin: deprecations
\
 | Use this plugin:
 | kubectl deprecations
 | Documentation:
 | https://github.com/rikatz/kubepug
 | Caveats:
 | \
 |  | * By default, deprecations finds deprecated object relative to the current kubernetes
 |  | master branch. To target a different kubernetes release, use the --k8s-version
 |  | argument.
 |  | 
 |  | * Deprecations needs permission to GET all objects in the Cluster
 | /
/
WARNING: You installed plugin "deprecations" from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.


Execution



Once installed, the plugin is invoked using kubectl deprecations.

Scan Current Cluster


Check your live cluster for deprecated APIs against a specific target Kubernetes version:

kubectl deprecations --k8s-version=v1.33
                    
Error: failed to get apiservices: apiservices.apiregistration.k8s.io is forbidden: User "sso:user" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
time="2026-02-26T14:20:19Z" level=error msg="An error has occurred: failed to get apiservices: apiservices.apiregistration.k8s.io is forbidden: User \"sso:user\" cannot list resource \"apiservices\" in API group \"apiregistration.k8s.io\" at the cluster scope"


KubePug requires running user to have "list" permissions on resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope as otherwise the above error will appear.

If required permissions are in place:

% kubectl deprecations --k8s-version=v1.33

No deprecated or deleted APIs found

Kubepug validates the APIs using Kubernetes markers. To know what are the deprecated and deleted APIS it checks, please go to https://kubepug.xyz/status/


Scan Local Manifest Files


Validate static YAML files before applying them to a cluster:

kubectl deprecations --input-file=./my-manifests/


View Results in Different Formats


Output the findings in json or yaml for automated processing:

kubectl deprecations --format=json


Check for Help and Flags


See all available configuration options, such as using a custom database file or setting error codes:

kubectl deprecations --help


Key Parameters


--k8s-version: The Kubernetes release you intend to upgrade to (defaults to the latest stable).
--error-on-deprecated: Forces the command to exit with an error code if deprecated APIs are found, which is useful for CI/CD pipelines. 


---

Introduction to Krew (Kubernetes tool)




Krew is the official plugin manager for the kubectl command-line tool. Much like apt for Debian or Homebrew for macOS, it allows users to easily discover, install, and manage custom extensions that add new subcommands to Kubernetes. 

Core Functionality

  • Discovery: Users can search a community-curated index of over 200 plugins designed for tasks like security auditing, resource visualization, and cluster management.
  • Lifecycle Management: It automates the process of installing, updating, and removing plugins across different operating systems (Linux, macOS, and Windows).
  • Unified Interface: Once a plugin is installed via Krew, it is invoked directly through kubectl (e.g., kubectl <plugin-name>). 


Installation


MacOS example output:

% (
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)
+-zsh:142> mktemp -d
+-zsh:142> cd /var/folders/8j/60m6_18j359_39ls0sr9ccvm0000gp/T/tmp.LKDSyQcO10
+-zsh:143> OS=+-zsh:143> uname
+-zsh:143> OS=+-zsh:143> tr '[:upper:]' '[:lower:]'
+-zsh:143> OS=darwin 
+-zsh:144> ARCH=+-zsh:144> uname -m
+-zsh:144> ARCH=+-zsh:144> sed -e s/x86_64/amd64/ -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/'
+-zsh:144> ARCH=arm64 
+-zsh:145> KREW=krew-darwin_arm64 
+-zsh:146> curl -fsSLO https://github.com/kubernetes-sigs/krew/releases/latest/download/krew-darwin_arm64.tar.gz
+-zsh:147> tar zxvf krew-darwin_arm64.tar.gz
x ./LICENSE
x ./krew-darwin_arm64
+-zsh:148> ./krew-darwin_arm64 install krew
Adding "default" plugin index from https://github.com/kubernetes-sigs/krew-index.git.
Updated the local copy of plugin index.
Installing plugin: krew
Installed plugin: krew
\
 | Use this plugin:
 | kubectl krew
 | Documentation:
 | https://krew.sigs.k8s.io/
 | Caveats:
 | \
 |  | krew is now installed! To start using kubectl plugins, you need to add
 |  | krew's installation directory to your PATH:
 |  | 
 |  |   * macOS/Linux:
 |  |     - Add the following to your ~/.bashrc or ~/.zshrc:
 |  |         export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
 |  |     - Restart your shell.
 |  | 
 |  |   * Windows: Add %USERPROFILE%\.krew\bin to your PATH environment variable
 |  | 
 |  | To list krew commands and to get help, run:
 |  |   $ kubectl krew
 |  | For a full list of available plugins, run:
 |  |   $ kubectl krew search
 |  | 
 |  | You can find documentation at
 |  |   https://krew.sigs.k8s.io/docs/user-guide/quickstart/.
 | /
/


Add Krew path to PATHs:

% vi ~/.zshrc 

% cat ~/.zshrc   
...
export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH" <-- added manually


Reload .zshrc configuration file within the currently running Zsh terminal (or just restart the terminal):

% source ~/.zshrc

Installation verification:

% kubectl krew
krew is the kubectl plugin manager.
You can invoke krew through kubectl: "kubectl krew [command]..."

Usage:
  kubectl krew [command]

Available Commands:
  help        Help about any command
  index       Manage custom plugin indexes
  info        Show information about an available plugin
  install     Install kubectl plugins
  list        List installed kubectl plugins
  search      Discover kubectl plugins
  uninstall   Uninstall plugins
  update      Update the local copy of the plugin index
  upgrade     Upgrade installed plugins to newer versions
  version     Show krew version and diagnostics

Flags:
  -h, --help      help for krew
  -v, --v Level   number for the log level verbosity

Use "kubectl krew [command] --help" for more information about a command.

Common Commands


To use Krew, you must first install it as a kubectl plugin itself. Key commands include: 

kubectl krew update: Updates the local list of available plugins.
kubectl krew search: Finds plugins in the official Krew index.
kubectl krew install <plugin>: Installs a specific plugin.
kubectl krew list: Displays all plugins currently installed through Krew.
kubectl krew upgrade: Updates all installed plugins to their latest versions. 

Example:

% kubectl krew install deprecations
Updated the local copy of plugin index.
Installing plugin: deprecations
Installed plugin: deprecations
\
 | Use this plugin:
 | kubectl deprecations
 | Documentation:
 | https://github.com/rikatz/kubepug
 | Caveats:
 | \
 |  | * By default, deprecations finds deprecated object relative to the current kubernetes
 |  | master branch. To target a different kubernetes release, use the --k8s-version
 |  | argument.
 |  | 
 |  | * Deprecations needs permission to GET all objects in the Cluster
 | /
/
WARNING: You installed plugin "deprecations" from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.

Popular Plugins Managed by Krew 

  • ctx / ns: Rapidly switch between Kubernetes contexts and namespaces.
  • tree: Visualizes the hierarchy of Kubernetes resources in a tree view.
  • access-matrix: Displays an RBAC (Role-Based Access Control) matrix for server resources.
  • get-all: Lists all resources in a namespace, including those often missed by kubectl get all. 

Note on Security (!)


Plugins in the Krew index are community-contributed and are not audited for security by the Kubernetes maintainers; you should only install plugins from sources you trust. 


---

Introduction to Pluto (Kubernetes tool)

 

Pluto is:
  • CLI tool that helps users find deprecated Kubernetes API versions in your code repositories and Helm releases. 
  • It's especially useful when upgrading Kubernetes clusters, as it identifies resources that need updating before the upgrade.
  • It works against:
    • Live clusters
    • Helm charts
    • Raw YAML
  • Pluto will show which APIs are deprecated or removed, what version they were deprecated in, and what the replacement API should be.


Installation on Mac (https://pluto.docs.fairwinds.com/installation/#homebrew-tap):

% brew install FairwindsOps/tap/pluto

Let's see its CLI arguments:

% pluto                    
You must specify a sub-command.
A tool to detect Kubernetes apiVersions

Usage:
  pluto [flags]
  pluto [command]

Available Commands:
  completion            Generate the autocompletion script for the specified shell
  detect                Checks a single file or stdin for deprecated apiVersions.
  detect-all-in-cluster run all in-cluster detections
  detect-api-resources  detect-api-resources
  detect-files          detect-files
  detect-helm           detect-helm
  help                  Help about any command
  list-versions         Outputs a JSON object of the versions that Pluto knows about.
  version               Prints the current version of the tool.

Flags:
  -f, --additional-versions string        Additional deprecated versions file to add to the list. Cannot contain any existing versions
      --columns strings                   A list of columns to print. Mandatory when using --output custom, optional with --output markdown
      --components strings                A list of components to run checks for. If nil, will check for all found in versions.
  -h, --help                              help for pluto
      --ignore-deprecations               Ignore the default behavior to exit 2 if deprecated apiVersions are found. (Only show removed APIs, not just deprecated ones)
      --ignore-removals                   Ignore the default behavior to exit 3 if removed apiVersions are found. (Only show deprecated APIs, not removed ones)
      --ignore-unavailable-replacements   Ignore the default behavior to exit 4 if deprecated but unavailable apiVersions are found.
  -H, --no-headers                        When using the default or custom-column output format, don't print headers (default print headers).
  -r, --only-show-removed                 Only display the apiVersions that have been removed in the target version.
  -o, --output string                     The output format to use. (normal|wide|custom|json|yaml|markdown|csv) (default "normal")
  -t, --target-versions stringToString    A map of targetVersions to use. This flag supersedes all defaults in version files. (default [])
  -v, --v Level                           number for the log level verbosity

Use "pluto [command] --help" for more information about a command.



detect-files


If we want to scan local Helm charts or manifest files before they are deployed, we can use pluto detect-files or pluto detect, which do require us to be in the correct directory or provide a file path.

To scan and detect deprecated APIs in manifest files in a directory:

% pluto detect-files -d /path/to/your/manifests

detect-files is for checking YAML files in our repositories/filesystem before deploying them - that's separate from detect-helm and detect-all-in-cluster cluster commands.

To target particular k8s version:

% pluto detect-files -d . --target-versions k8s=v1.33.0

If we use Terraform to deploy Helm charts, we might want to keep chart values in separate files (.yaml or .yaml.tpl) as otherwise we won't be able to use Pluto directly (we'd need to extract values into files first). For more details, see Where to keep Helm chart values in Terraform projects | My Public Notepad

detect-helm


To check Helm releases in the cluster (already deployed):

% pluto detect-helm -owide

To target particular k8s version:

% pluto detect-helm -owide --target-versions k8s=v1.33.0   
There were no resources found with known deprecated apiVersions.

detect-helm specifically checks Helm release metadata stored in our cluster (in secrets or configmaps) after Helm chart have been deployed. It looks at the manifests that Helm used to install releases, which might contain deprecated APIs even if they haven't been applied yet or are stored in Helm's history.

This command can be run from any directory. This is because the detect-helm command scans live Helm releases currently deployed in our Kubernetes cluster, rather than looking for local files on our machine. Instead of relying on our current working directory, the command depends on our Kubernetes context (the cluster our CLI is currently pointed at) and our local Helm configuration to communicate with the cluster.

While the directory doesn't matter, the following must be true for the command to work:
  • Active Kubernetes Context: Your kubectl context must be set to the target cluster.
  • Cluster Permissions: You must have sufficient RBAC permissions to read Secrets in the namespaces you wish to scan, as Helm 3 stores release information in cluster secrets.
  • Target Versioning: The --target-versions k8s=v1.33.0 flag tells Pluto to check for APIs that are deprecated or removed specifically in Kubernetes version 1.33.0, regardless of what version the cluster is actually running

detect-all-in-cluster


To check all resources in the cluster:

% pluto detect-all-in-cluster -o wide
I0226 12:01:02.279788   47100 warnings.go:110] "Warning: v1 ComponentStatus is deprecated in v1.19+"
There were no resources found with known deprecated apiVersions.

detect-all-in-cluster scans all live resources currently running in our cluster by querying the Kubernetes API directly. It checks deployments, services, pods, etc. that are actively deployed.

detect-all-in-cluster does NOT include detect-helm or detect-files. Here's why they're separate:
  • detect-all-in-cluster sees the current state of resources
  • detect-helm sees Helm's stored templates and history, which may include:
    • Templated manifests that haven't been rendered yet
    • Old release revisions
    • Chart templates with deprecated APIs
  • Run both to get complete coverage!

Target a specific Kubernetes version:

% pluto detect-all-in-cluster --target-versions k8s=v1.33.0
I0226 12:02:26.551401   47113 warnings.go:110] "Warning: v1 ComponentStatus is deprecated in v1.19+"
There were no resources found with known deprecated apiVersions.

The warning message:

Warning: v1 ComponentStatus is deprecated in v1.19+

This is just Pluto itself triggering a Kubernetes API warning while scanning - it's not something wrong with our cluster resources.

The main result:

There were no resources found with known deprecated apiVersions.

This means all our cluster resources are using API versions that are still valid in Kubernetes v1.33.0 (our target version). This means:
  • Our cluster resources are already compatible with k8s v1.33.0
  • No manifests need updating before upgrading
  • No deprecated APIs that would be removed in v1.33.0

To be thorough before a k8s upgrade, we need to run all three commands:
  • detect-files
  • detect-helm
  • detect-all-in-cluster

---

Where to keep Helm chart values in Terraform projects


If we use Terraform to deploy Helm charts, we might be using one of these strategies to keep chart values:

  1. Values are in inline YAML string
  2. Values in separate .yaml file
  3. Values in separate YAML Template files (.yaml.tpl)
  4. Use Helm's set for Dynamic Values
  5. Multiple Values Files

(1) Values in inline YAML string


This is not ideal as problems with Inline YAML in Terraform include:
  • No syntax highlighting or validation - Easy to break YAML formatting
  • Hard to review in diffs - Changes are messy in PRs
  • Can't use standard tooling - No yamllint, Pluto, or other YAML tools
  • Mixing concerns - Infrastructure code mixed with application config
  • Escaping nightmares - Terraform string interpolation conflicts with Helm templating

Example:

resource "helm_release" "app" {
  values = [<<-EOT
    replicaCount: ${var.replicas}
    image:
      repository: myapp
      tag: ${var.tag}
    service:
      type: LoadBalancer
  EOT
  ]
}


(2) Separate Values Files 


Keep values in YAML files, reference them in Terraform.
This is a better approach because:
  • Clean separation
  • Easy to validate with standard tools
  • Better diffs
  • Can use Pluto directly: pluto detect-files -d .

Example:

main.tf:

resource "helm_release" "my_app" {
  name       = "my-app"
  chart      = "my-chart"
  repository = "https://charts.example.com"
  
  values = [
    file("${path.module}/helm-values.yaml")
  ]
}


(3) Templated Values Files


Use Terraform's templatefile() to inject dynamic values:


helm-values.yaml.tpl:

replicaCount: ${replica_count}
image:
  repository: ${image_repo}
  tag: ${image_tag}
ingress:
  enabled: ${enable_ingress}
  host: ${hostname}

main.tf:

resource "helm_release" "my_app" {
  name  = "my-app"
  chart = "my-chart"
  
  values = [
    templatefile("${path.module}/helm-values.yaml.tpl", {
      replica_count  = var.replica_count
      image_repo     = var.image_repository
      image_tag      = var.image_tag
      enable_ingress = var.enable_ingress
      hostname       = var.hostname
    })
  ]
}

Pros:

  • Still gets variable injection
  • Can be validated as YAML (with placeholders)
  • Clean and readable


(4) Use Helm's set for Dynamic Values


Keep static config in files, override specific values:


resource "helm_release" "my_app" {
  name       = "my-app"
  chart      = "my-chart"
  
  # Base values from file
  values = [
    file("${path.module}/helm-values.yaml")
  ]
  
  # Override specific values dynamically
  set {
    name  = "image.tag"
    value = var.image_tag
  }
  
  set {
    name  = "replicaCount"
    value = var.replica_count
  }
  
  set_sensitive {
    name  = "secret.password"
    value = var.db_password
  }
}

Pros:
  • Clear what's dynamic vs static
  • Base values file can be validated
  • Sensitive values handled properly

Here is the example how we can migrate inline YAML from the above to templated file:

helm-values.yaml:

image:
  repository: myapp
service:
  type: LoadBalancer


main.tf:

resource "helm_release" "app" {
  values = [
    file("${path.module}/helm-values.yaml")
  ]
  
  set {
    name  = "replicaCount"
    value = var.replicas
  }
  
  set {
    name  = "image.tag"
    value = var.tag
  }
}

Now we can run: 

% pluto detect-files -f helm-values.yaml



(5) Multiple Values Files


We can layer our configuration:

resource "helm_release" "my_app" {
  name  = "my-app"
  chart = "my-chart"
  
  values = [
    file("${path.module}/helm-values-base.yaml"),
    file("${path.module}/helm-values-${var.environment}.yaml")
  ]
}

---

Introduction to Kubent (Kube No Trouble)

 

Kubent (Kube No Trouble)  [this link was the original repo, see comments below] is a tool which scans k8s cluster and reports resources that use deprecated or removed Kubernetes APIs, based on the target Kubernetes version. It’s especially useful before upgrading (e.g., EKS 1.32 → 1.33)

WARNING: Development at project's original repo (https://github.com/doitintl/kube-no-trouble) is not active anymore as the last commit was in January 2025. The original author announced here that they would be moving development to https://github.com/dark0dave/kube-no-trouble and that repo is ssemingly active as of today (last change was ) BUT https://github.com/dark0dave/kube-no-trouble/tree/301e5783904de5966f79b217a956651146630f50/pkg/rules/rego shows that rulesets only up to v1.32 were added (!).

Kube No Trouble relies on static Rego rule files in the repo. If new Kubernetes versions (e.g., >1.32) don’t have updated rules, then:
  • It won’t know about newly deprecated APIs
  • It won’t know about newly removed APIs
  • --target-version becomes unreliable for newer releases

For modern upgrades (especially 1.32 → 1.33+), kubent is no longer the safest tool.

To install it:

% sh -c "$(curl -sSL https://git.io/install-kubent)"
>>> kubent installation script <<<
> Detecting latest version
> Downloading version 0.7.3
Target directory (/usr/local/bin) is not writable, trying to use sudo
Password:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 12.4M  100 12.4M    0     0  14.7M      0 --:--:-- --:--:-- --:--:-- 13.2M
> Done. kubent was installed to /usr/local/bin/.


To verify installation:

% kubent --version
7:48AM INF version 0.7.3 (git sha 57480c07b3f91238f12a35d0ec88d9368aae99aa)


To check CLI arguments:

% kubent --help   
Usage of kubent:
  -A, --additional-annotation strings   additional annotations that should be checked to determine the last applied config
  -a, --additional-kind strings         additional kinds of resources to report in Kind.version.group.com format
  -c, --cluster                         enable Cluster collector (default true)
  -x, --context string                  kubeconfig context
  -e, --exit-error                      exit with non-zero code when issues are found
  -f, --filename strings                manifests to check, use - for stdin
      --helm3                           enable Helm v3 collector (default true)
  -k, --kubeconfig string               path to the kubeconfig file
  -l, --log-level string                set log level (trace, debug, info, warn, error, fatal, panic, disabled) (default "info")
  -o, --output string                   output format - [text|json|csv] (default "text")
  -O, --output-file string              output file, use - for stdout (default "-")
  -t, --target-version string           target K8s version in SemVer format (autodetected by default)
  -v, --version                         prints the version of kubent and exits
pflag: help requested

It looks at default ~/.kube/config file in order to find the current context, otherwise use -k to specify kubeconfig at non-default location.



% kubent                       
7:59AM INF >>> Kube No Trouble `kubent` <<<
7:59AM INF version 0.7.3 (git sha 57480c07b3f91238f12a35d0ec88d9368aae99aa)
7:59AM INF Initializing collectors and retrieving data
7:59AM INF Target K8s version is 1.32.11-eks-ac2d5a0
7:59AM INF Retrieved 12 resources from collector name=Cluster
8:00AM INF Retrieved 361 resources from collector name="Helm v3"
8:00AM INF Loaded ruleset name=custom.rego.tmpl
8:00AM INF Loaded ruleset name=deprecated-1-16.rego
8:00AM INF Loaded ruleset name=deprecated-1-22.rego
8:00AM INF Loaded ruleset name=deprecated-1-25.rego
8:00AM INF Loaded ruleset name=deprecated-1-26.rego
8:00AM INF Loaded ruleset name=deprecated-1-27.rego
8:00AM INF Loaded ruleset name=deprecated-1-29.rego
8:00AM INF Loaded ruleset name=deprecated-1-32.rego
8:00AM INF Loaded ruleset name=deprecated-future.rego


Running kubent with no other arguments:
  • Connects to your current kube-context
  • Detects your cluster version automatically
  • Scans all namespaces
  • Compares resources against deprecations for that version

Before the upgrade to v1.33, we want kubent to scan the resources against that next k8s version so we need to specify it with --target-version:

% kubent --target-version=1.33
8:02AM INF >>> Kube No Trouble `kubent` <<<
8:02AM INF version 0.7.3 (git sha 57480c07b3f91238f12a35d0ec88d9368aae99aa)
8:02AM INF Initializing collectors and retrieving data
8:02AM INF Target K8s version is 1.33.0
8:02AM INF Retrieved 12 resources from collector name=Cluster
8:03AM INF Retrieved 361 resources from collector name="Helm v3"
8:03AM INF Loaded ruleset name=custom.rego.tmpl
8:03AM INF Loaded ruleset name=deprecated-1-16.rego
8:03AM INF Loaded ruleset name=deprecated-1-22.rego
8:03AM INF Loaded ruleset name=deprecated-1-25.rego
8:03AM INF Loaded ruleset name=deprecated-1-26.rego
8:03AM INF Loaded ruleset name=deprecated-1-27.rego
8:03AM INF Loaded ruleset name=deprecated-1-29.rego
8:03AM INF Loaded ruleset name=deprecated-1-32.rego
8:03AM INF Loaded ruleset name=deprecated-future.rego

---

Monday, 23 February 2026

Introduction to Grafana Loki




Grafana Loki:

These are the notes from Loki Helm chart: 

***********************************************************************
  Welcome to Grafana Loki
  Chart version: 6.31.0
  Chart Name: loki
  Loki version: 3.5.0
***********************************************************************

Tip:

Watch the deployment status using the command: kubectl get pods -w --namespace grafana-loki

If pods are taking too long to schedule make sure pod affinity can be fulfilled in the current cluster.

***********************************************************************
Installed components:
***********************************************************************
* gateway
* read
* write
* backend


***********************************************************************
Sending logs to Loki
***********************************************************************

Loki has been configured with a gateway (nginx) to support reads and writes from a single component.

You can send logs from inside the cluster using the cluster DNS:

http://loki-gateway.grafana-loki.svc.cluster.local/loki/api/v1/push

You can test to send data from outside the cluster by port-forwarding the gateway to your local machine:

  kubectl port-forward --namespace grafana-loki svc/loki-gateway 3100:80 &

And then using http://127.0.0.1:3100/loki/api/v1/push URL as shown below:

curl \
-H "Content-Type: application/json" \
-XPOST \
-s "http://127.0.0.1:3100/loki/api/v1/push"  \
--data-raw "{\"streams\": [{\"stream\": {\"job\": \"test\"}, \"values\": [[\"$(date +%s)000000000\", \"fizzbuzz\"]]}]}" \
-H X-Scope-OrgId:foo


Then verify that Loki did receive the data using the following command:

curl "http://127.0.0.1:3100/loki/api/v1/query_range" \
--data-urlencode 'query={job="test"}' \
-H X-Scope-OrgId:foo | jq .data.result

***********************************************************************
Connecting Grafana to Loki
***********************************************************************

If Grafana operates within the cluster, you'll set up a new Loki datasource by utilizing the following URL:

http://loki-gateway.grafana-loki.svc.cluster.local/

***********************************************************************
Multi-tenancy
***********************************************************************

Loki is configured with auth enabled (multi-tenancy) and expects tenant headers (`X-Scope-OrgID`) to be set for all API calls.

You must configure Grafana's Loki datasource using the `HTTP Headers` section with the `X-Scope-OrgID` to target a specific tenant.
For each tenant, you can create a different datasource.

The agent of your choice must also be configured to propagate this header.
For example, when using Promtail you can use the `tenant` stage. https://grafana.com/docs/loki/latest/send-data/promtail/stages/tenant/

When not provided with the `X-Scope-OrgID` while auth is enabled, Loki will reject reads and writes with a 404 status code `no org id`.

You can also use a reverse proxy, to automatically add the `X-Scope-OrgID` header as suggested by https://grafana.com/docs/loki/latest/operations/authentication/

For more information, read our documentation about multi-tenancy: https://grafana.com/docs/loki/latest/operations/multi-tenancy/

> When using curl you can pass `X-Scope-OrgId` header using `-H X-Scope-OrgId:foo` option, where foo can be replaced with the tenant of your choice.
EOT -> (known after apply)
 
---

Friday, 20 February 2026

Grafana Observability Stack

 




Grafana uses these components together as an observability stack, but each has a clear role:


Loki – log database. It stores and indexes logs (especially from Kubernetes) in a cost‑efficient, label‑based way, similar to Prometheus but for logs.

Tempo – distributed tracing backend. It stores distributed traces (spans) from OpenTelemetry, Jaeger, Zipkin, etc., so you can see call flows across microservices and where latency comes from.

Mimir – Prometheus‑compatible metrics backend. It is a horizontally scalable, long‑term storage and query engine for Prometheus‑style metrics (time series).

Alloy – telemetry pipeline (collector). It is Grafana’s distribution of the OpenTelemetry Collector / Prometheus agent / Promtail ideas, used to collect, process, and forward metrics, logs, traces, profiles into Loki/Tempo/Mimir (or other backends).


How Grafana UI relates to them


Grafana UI itself is “just” the visualization and alerting layer:

  • It connects to Loki, Tempo, Mimir (and many others) as data sources.
  • For each backend you configure:
    • A Loki data source for logs.
    • A Tempo data source for traces.
    • A Prometheus/Mimir data source for metrics (Mimir exposes a Prometheus‑compatible API).
  • Grafana then lets you:
    • Build dashboards and alerts from Mimir metrics.
    • Explore logs from Loki.
    • Explore traces from Tempo and cross‑link them with logs/metrics (e.g., click from a log line to a trace, or from a metrics graph into logs/traces).

A useful mental model: Loki/Tempo/Mimir are databases, Alloy is the collector/router, and Grafana is the UI on top.


Are they deployed in the same Kubernetes cluster?


Common patterns:

  • Very common: deploy Loki, Tempo, Mimir, Alloy, and Grafana in the same Kubernetes cluster as your apps. This is the typical “in‑cluster LGTM” setup; all telemetry stays inside the cluster and traffic is simple.
  • Also common: run them in a separate observability cluster (or use Grafana Cloud backends), while Alloy/agents run in each workload cluster and ship data over the network. This improves isolation and makes it easier to share one observability stack across many clusters.
  • In smaller setups or dev environments, everything (apps + LGTM + Grafana) often lives in one cluster; in larger/regulated setups, people tend to separate “workload clusters” and an “observability cluster”.

So: they don’t have to be on the same cluster, but it’s perfectly normal (and often simplest) to run Grafana + Loki + Tempo + Mimir + Alloy together in a single Kubernetes cluster and point your apps’ telemetry to Alloy.


Why not using elasticsearch instead of loki, tempo and mimir?


Elasticsearch can replace part of what Loki, Tempo, and Mimir do, but not all of it, and usually with higher cost/complexity for cloud‑native observability.

1. Scope: logs vs full observability


Elasticsearch is a general search and analytics engine that’s great at full‑text search, aggregations, and analytics over documents (including logs).

The LGTM stack is explicitly split by signal:
  • Loki → logs
  • Tempo → traces
  • Mimir → metrics

Each is optimized only for its signal type and integrates tightly with Grafana and modern telemetry standards.

You could plausibly replace Loki with Elasticsearch for logs, but Elasticsearch does not natively replace Tempo (distributed tracing backend) or Mimir (Prometheus‑compatible metrics backend).

2. Logs: Loki vs Elasticsearch


Elasticsearch strengths:
  • Very powerful full‑text search, fuzzy matching, relevance scoring, complex aggregations.
  • Good when you need deep forensic search and advanced analytics on log text.

Loki strengths:
  • Stores logs as compressed chunks plus a small label index, so storage and compute are much cheaper than Elasticsearch for typical Kubernetes logs.
  • Very tight integration with Grafana and the rest of LGTM, and simple, label‑based querying.

Trade‑off: Elasticsearch gives richer search at a high infra + ops cost, Loki gives “good enough” search for operational troubleshooting with much lower cost and operational burden.

3. Traces and metrics: Tempo & Mimir vs “just ES”


Tempo:
  • Implements distributed tracing concepts (spans, traces, service graphs) and OpenTelemetry/Jaeger/Zipkin protocols; the data model and APIs are specialized for traces.
  • Elasticsearch can store trace‑like JSON documents, but you’d have to build/maintain all the trace stitching, UI navigation, and integrations yourself.

Mimir:
  • Is a horizontally scalable, Prometheus‑compatible time‑series database, with native remote‑write/read and PromQL semantics.
  • Elasticsearch can store time‑stamped metrics, but you lose Prometheus compatibility, PromQL semantics, and the whole ecosystem that expects a Prometheus‑style API.

So using only Elasticsearch means you’re giving up the standard metrics and tracing ecosystems and rebuilding a lot of tooling on top of a generic search engine.

4. Cost, complexity, and operational burden


Elasticsearch clusters generally need:
  • More RAM/CPU per node, careful shard and index management, and capacity planning.
  • Storage overhead from full‑text indexes (often 1.5–3× raw log size plus replicas).
Loki/Tempo/Mimir:

  • Are designed for object storage, compression, and label‑only indexing, which dramatically lowers storage and compute requirements for logs and metrics.
  • Have simpler, well‑documented reference architectures specifically for observability.

For a modern Kubernetes‑centric environment, that usually makes LGTM cheaper and easier to run than a single big Elasticsearch cluster for everything.

5. When Elasticsearch still makes sense


You might still choose Elasticsearch (often with Kibana/APM) if:
  • You already have a strong ELK stack and team expertise.
  • Your primary need is deep, flexible text search and analytics over logs, with less emphasis on Prometheus/OTel ecosystems.
  • You want Elasticsearch’s ML/anomaly‑detection features and are willing to pay the operational cost.

But if your goal is a Grafana‑centric, standards‑based (Prometheus + OpenTelemetry) observability platform, LGTM (Loki+Tempo+Mimir, plus Alloy as collector) is a better fit than trying to push everything into Elasticsearch.

---

Here document (heredoc)




Here document (heredoc) redirects a multiline string literal to the preceding command while preserving line breaks. Unix syntax for it is:

[command] <<DELIMITER
    First line.
    Second line.
    Third line.
    Fourth line.
DELIMITER


<< is Redirection Operator
- is optional Tab Suppression
DELIMITER - an arbitrary string, Delimiter Token; must be the same at the beginning and at the end

Appending a minus sign to the redirection operator <<- causes all leading tab characters to be ignored. This allows you to use indentation when writing heredocs in shell scripts. We can then indent both the here-doc and the delimiter with tabs (not spaces!):

#! /bin/bash
cat <<-EOF
    indented
    EOF
echo Done

---

References:



Wednesday, 18 February 2026

How to fix pods in Not Ready state?


kubectl get pods might show that some of the pods have 0/N value in READY column.

What is the meaning of READY column value?


In the context of kubectl get pods, the READY column shows the number of containers in the pod that have passed their health checks and are ready to serve traffic.

The anatomy of R/T:
  • R (Left side): This is the number of containers currently Ready. A 0 means the application inside the container is not responding to its "Readiness Probe" or has not finished starting up.
  • / (Separator): Separates ready containers from the total.
  • T (Right side): This is the Total number of user containers defined in that pod.

When we have a pod in 0/1 state with a status of Running, it means the container has started, but Kubernetes does not consider it "healthy" enough to handle requests. Our pods are technically "alive" (Running) but "unusable" (Not Ready).

Common Reasons for 0/1 Running:

  • Failed Readiness Probe: The application is running, but the health check URL (e.g., /ready) is returning an error or timing out.
  • Slow Startup: The application takes a long time to initialize, and the "Initial Delay" isn't long enough.
  • Dependency Issues: The pod is waiting for a database, a config file, or another service that isn't available.
  • CrashLoopBackOff (Transitions): Sometimes pods flicker between 0/1 Running and 0/1 CrashLoopBackOff as they try to start and immediately fail.

How to find out exactly what's wrong


To diagnose why our my-app pods are stuck, run these two commands:

(1) Check the Events (Why it's not ready):

kubectl describe pod my-app-0 -n my-app-namespace

Look at the "Events" section at the bottom for messages like "Readiness probe failed."

Output example:

Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  47s (x81205 over 8d)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 503


(2) Check the Logs (What the app is saying):

kubectl logs my-app-0 -n my-app-namespace

Look for "Error," "Exception," or "Connection Refused" messages.

---

Tuesday, 17 February 2026

How to use terraform-docs automatically generate Terraform code documentation

 

terraform-docs is a tool used to automatically generate Terraform code documentation.

To install it on Mac:

% brew install terraform-docs 

To verify installation:

% terraform-docs --version                                        
terraform-docs version v0.21.0 darwin/arm64

To generate a documentation for a module in the current directory and append it to the README file (which is in the same directory):

% terraform-docs markdown table --output-file README.md --output-mode inject ./


How to install Terraform on Mac



First add Hashicorp's package repository:

% brew tap hashicorp/tap

Then install the Terraform:

% brew install hashicorp/tap/terraform

If Terraform was already installed, the command above will update it.

To verify installation, we can check its version:

% terraform --version                                                                                    
Terraform v1.14.5
on darwin_arm64

Friday, 6 February 2026

Amazon EKS Autoscaling with Karpenter



Kubernetes autoscaling is a function that scales resources in and out depending on the current workload. AWS supports two autoscaling implementations:
  • Cluster Autoscaler
  • Karpenter 
    • Karpenter
    • flexible, high-performance Kubernetes cluster autoscaler and node provisioner
    • helps improve application availability and cluster efficiency
    • launches right-sized compute resources (for example, Amazon EC2 instances) in response to changing application load in under a minute
    • can provision just-in-time compute resources that precisely meet the requirements of our workload
    • automatically provisions new compute resources based on the specific requirements of cluster workloads. These include compute, storage, acceleration, and scheduling requirements. 
    • creates Kubernetes nodes directly from EC2 instances
    • improves the efficiency and cost of running workloads on the cluster
    • open-source


Pod Scheduler


  • Kubernetes cluster component responsible for determining which node Pods get assigned to
  • default Pod scheduler for Kubernetes is kube-scheduler
    • logs the reasons Pods can't be scheduled

Unschedulable Pods



A Pod is unschedulable when it's been put into Kubernetes' scheduling queue, but can't be deployed to a node. This can be for a number of reasons, including:
  • The cluster not having enough CPU or RAM available to meet the Pod's requirements.
  • Pod affinity or anti-affinity rules preventing it from being deployed to available nodes.
  • Nodes being cordoned due to updates or restarts.
  • The Pod requiring a persistent volume that's unavailable, or bound to an unavailable node.

How to detect unschedulable Pods?

Pods waiting to be scheduled are held in the "Pending" status, but if the Pod can't be scheduled, it will remain in this state. However, Pods that are being deployed normally are also marked as "Pending." The difference comes down to how long a Pod remains in "Pending." 

How to  fix unschedulable Pods? 
There is no single solution for unschedulable Pods as they have many different causes. However, there are a few things we can try depending on the cause. 
  • Enable cluster autoscaling
    • If we're using a managed Kubernetes service like Amazon EKS or Google Kubernetes Engine (GKE), we can very easily take advantage of autoscaling to increase and decrease cluster capacity on-demand. With autoscaling enabled, Kubernetes' Cluster Autoscaler will trigger our provider to add nodes when needed. As long as we've configured our cluster node pool and it hasn't reached its max node limit, our provider will automatically provision a new node and add it to the pool, making it available to the cluster and to our Pods.
  • Increase our node capacity
  • Check our Pod requests
  • Check our affinity and anti-affinity rules 

 

In this article we'll show how to enable cluster autoscaling with Karpenter.


How does the regular Kubernetes Autoscaler work in AWS?


When we create a regular Kubernetes cluster in AWS, each node group is managed by the AWS Auto-scaling group [Auto Scaling groups - Amazon EC2 Auto Scaling]. Cluster native autoscaler adjusts the desired size based on the load in the cluster to fit all unscheduled pods.

HorizontalPodAutoscaler (HPA) [Horizontal Pod Autoscaling | Kubernetes] is built into Kubernetes and it uses metrics like CPU usage, memory usage or custom metrics we can write to decide when to spin up or down additional pods in the node of the cluster. If our app is receiving more traffic, HPA will kick in and provision additional pods. 

VerticalPodAutoscaler (VPA) can also be installed in cluster where it manages the resource (like CPU and memory) allocation to pods that are already running.

What about when there's not enough capacity to schedule any more pods in the node? That's when we'll need an additional node. So we have a pod that needs to be scheduled but we don't know where to put it. We could call AWS API, spin up an additional EC2 node, get added it to our cluster or if we're using managed groups we can use Managed Node Group API, bump up the desired size but easier approach is to use cluster auto-scaler. There is a mature open-source solution called Cluster Auto-Scaler (CAS).

CAS was built to handle hundreds of different combinations of nodes types, zones, purchase options available in AWS. CAS works directly with managed node groups or self-managed managed nodes and auto-scaling groups which are AWS constructs to help us manage nodes. 


What are the issues with the regular Kubernetes Autoscaler?


Let's say CAS is installed on node, in cluster and manages one managed node group (MNG). It's filling up and we have an additional pod that needs to be provisioned so CAS tells MNG to bump up the number of nodes so it spins up another one so pod can now be scheduled. But this is not ideal. We have a single pod in a node, we don't need such a big node. 

This can be solved by creating a different MNG with a smaller instance type and now CAS recognizes that instance and provisions pod on a more appropriately-sized node.

Unfortunately, we might end up with many MNGs, based on requirements which might be a challenge to manage especially when looking best practices in terms of cost efficiency and high availability. 


How does Karpenter work?


Karpenter works differently, It doesn't use MNG or ASGs and manages each node directly. Let's say we have different pods, of different sizes. Let's say that HPA says that we need more of the smaller pods. Karpenter will intelligently pick the right instance type for that workload. If we need to spin up a larger pod it will again pick the right instance type. 

Karpenter picks exactly the right type of node for our workload. 

If we're using spot instances and spot capacity is not available, Karpenter does retries more quickly. Karpenter offers, faster, dynamic, more intelligent compute, using best practices without operational overhead of managing nodes ourselves. 

How to control how Karpenter operates?

There are many dimensions here. We can set constraints on Karpenter to limit the instances type, we can set up taints to isolate workloads to specific types of nodes. Different teams can have isolated access to different pods, one team can access billing pods, another GPU-based instances. 

Workload Consolidation feature: Pods are consolidated into fewer nodes.. let's say we have 3 nodes, two at 70% and one at 20% utilization. Karpenter detects this and will move pods from underutilized node to those two and shut down this now empty node (instances are terminated). This leads to lower costs.

Karpenter is making it easier to use spot and graviton instances which can also lead to lower costs. 

A feature to keep our nodes up to date. ttlSecondsUntilExpired parameter tells Karpenter to terminate nodes after a set amount of time. These nodes will automatically be replaced with new nodes, running the latest AMIs.

Karpenter:
1) lower costs
2) higher application availability 
3) lower operation overhead


Karpenter needs permissions to create EC2 instances in AWS. 

If we use a self-hosted (on bare metal boxes or EC2 instances), self-managed (we have full control over all aspects of Kubernetes) Kubernetes cluster, for example by using kOps (see also Is k8s Kops preferable than eks? : r/kubernetes), we can add additional IAM policies to the existing IAM role attached to Kubernetes nodes. 

If using EKS, the best way to grant access to internal service is with IAM roles for service accounts (IRSA).


Karpenter's Kubernetes Custom Resources


NodePool


NodePool is the primary Custom Resource (CR) in Karpenter that defines scheduling constraints, how nodes are provisioned and managed (node management policies). It is the successor to the older Provisioner API and acts as the brain that tells Karpenter which nodes to create and how to handle them over time. It acts as the "brain" for scheduling decisions by evaluating the requirements of pending pods and matching them to infrastructure constraints.

Core Role of NodePool
  • Scheduling Authority: It defines the constraints (instance types, zones, architectures) that determine which nodes can be created.
  • Successor to Provisioner: It replaced the older Provisioner API to provide a more scalable and configuration-based approach.
  • Management Hub: It handles node lifecycle settings, including disruption policies (consolidation and expiration) and aggregate resource limits (CPU/Memory).

Core Functions

A NodePool manages three primary aspects of our cluster's compute capacity: 
  • Scheduling Constraints: Restricts which nodes can be provisioned using requirements for instance types, zones, architectures (e.g., x86 vs. ARM), and capacity types (Spot vs. On-Demand).
  • Disruption Policies: Governs how Karpenter optimizes the cluster by defining when nodes should be expired or consolidated to save costs.
  • Resource Limits: Sets a cap on the total CPU and memory that the NodePool can provision, preventing runaway costs. 

Key Components of a NodePool

The specification is divided into several functional areas:
  • template: Defines the configuration for the nodes that will be created.
  • requirements: Uses well-known Kubernetes labels (e.g., karpenter.sh/capacity-type) to select hardware.
  • nodeClassRef: Points to an EC2NodeClass for cloud-provider-specific settings like subnets and security groups.
  • disruption: Replaces older TTL settings with a unified policy for consolidationPolicy (e.g., WhenUnderutilized) and expireAfter.
  • limits: Defines the maximum aggregate resources (e.g., cpu: 1000) allowed for this pool. 

Example v1 Configuration

This example demonstrates a production-ready NodePool that prioritises Spot instances but allows for On-
Demand fallback. 

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot", "on-demand"]
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64", "arm64"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # 30 days
  limits:
    cpu: "500"
    memory: 1000Gi


Comparison with Other Objects

While the NodePool is the central configuration object, it works in a hierarchy with two other key resources: 
  • NodePool
    • Purpose: The Logic: Defines what nodes should look like and how they should behave.
  • EC2NodeClass
    • Purpose: The Infrastructure: Defines where and with what AWS-specific settings (subnets, AMIs, security groups) nodes launch.
  • NodeClaim
    • Purpose: The Instance: Represents an individual node currently being managed or provisioned by Karpenter.

Every NodePool must reference at least one EC2NodeClass to successfully provision capacity on AWS.

Useful Commands:

To see all node pools:

% kubectl get nodepools                   
NAME                NODECLASS           NODES   READY   AGE
clickhouse          clickhouse          0       True    140d
clickhouse-backup   clickhouse-backup   0       True    140d 

Cluster user needs to have permission to list resource "nodepools" in API group "karpenter.sh" at the cluster scope.

To debug a specific node pool:

kubectl describe nodepool <nodepool-name>

Cluster user needs to have permission to get resource "nodepools" in API group "karpenter.sh" at the cluster scope.

% kubectl describe nodepool clickhouse
Name:         clickhouse
Namespace:    
Labels:       <none>
Annotations:  karpenter.sh/nodepool-hash: 12671849087427876759
              karpenter.sh/nodepool-hash-version: v3
API Version:  karpenter.sh/v1
Kind:         NodePool
Metadata:
  Creation Timestamp:  2025-10-22T15:02:58Z
  Generation:          2
  Resource Version:    1073678
  UID:                 f7869dd3-ac24-4600-98a6-059073645769
Spec:
  Disruption:
    Budgets:
      Nodes:               10%
    Consolidate After:     0s
    Consolidation Policy:  WhenEmptyOrUnderutilized
  Template:
    Metadata:
      Labels:
        Karpenter - Node - Pool:  clickhouse
    Spec:
      Expire After:  720h
      Node Class Ref:
        Group:  karpenter.k8s.aws
        Kind:   EC2NodeClass
        Name:   clickhouse
      Requirements:
        Key:       node.kubernetes.io/instance-type
        Operator:  In
        Values:
          r8g.xlarge
          r8g.2xlarge
          r8g.4xlarge
          r8g.8xlarge
        Key:       karpenter.sh/capacity-type
        Operator:  In
        Values:
          on-demand
          spot
Status:
  Conditions:
    Last Transition Time:  2025-10-22T15:02:59Z
    Message:               
    Observed Generation:   2
    Reason:                ValidationSucceeded
    Status:                True
    Type:                  ValidationSucceeded
    Last Transition Time:  2025-10-22T15:03:07Z
    Message:               
    Observed Generation:   2
    Reason:                NodeClassReady
    Status:                True
    Type:                  NodeClassReady
    Last Transition Time:  2025-10-23T17:24:01Z
    Message:               
    Observed Generation:   2
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Resources:
    Cpu:                  0
    Ephemeral - Storage:  0
    Memory:               0
    Nodes:                0
    Pods:                 0
Events:                   <none>



EC2NodeClass


EC2NodeClass is a Custom Resource (CR) used to define AWS-specific infrastructure configurations for the nodes Karpenter provisions. 

While a NodePool handles high-level scheduling constraints (like instance types or taints), the EC2NodeClass dictates the underlying Amazon EC2 settings. 

Key Responsibilities


The EC2NodeClass abstracts cloud provider-specific details, including: 
  • Networking: Selects subnets using subnetSelectorTerms.
  • Security: Identifies security groups via securityGroupSelectorTerms.
  • Identity: Assigns the IAM role or instance profile for the nodes.
  • Storage: Configures blockDeviceMappings for EBS volumes.
  • Images: Specifies the Amazon Machine Image (AMI) family (e.g., AL2, Bottlerocket) or selects specific AMIs.
  • Customisation: Includes userData for custom bootstrap scripts. 

Relationship with NodePools


A NodePool must reference an EC2NodeClass using the nodeClassRef field. Multiple NodePools can point to the same EC2NodeClass if they share the same infrastructure requirements (e.g., same VPC and IAM role).

Example Configuration


A basic EC2NodeClass manifest typically looks like this: 

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: "KarpenterNodeRole-my-cluster"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster

Useful Commands:


To see all EC2NodeClasses

kubectl get ec2nodeclasses 

Cluster user needs to have permission to list resource "ec2nodeclasses" in API group "karpenter.k8s.aws" at the cluster scope.

Example:

% kubectl get ec2nodeclasses        
NAME                READY   AGE
clickhouse          True    140d
clickhouse-backup   True    140d

To debug a specific node that isn't coming online:

kubectl describe ec2nodeclasses <ec2nodeclass-name>

Cluster user needs to have permission to get resource "ec2nodeclasses" in API group "karpenter.k8s.aws" at the cluster scope.

Example:

% kubectl describe ec2nodeclass clickhouse
Name:         clickhouse
Namespace:    
Labels:       <none>
Annotations:  karpenter.k8s.aws/ec2nodeclass-hash: 358699366951558737
              karpenter.k8s.aws/ec2nodeclass-hash-version: v4
API Version:  karpenter.k8s.aws/v1
Kind:         EC2NodeClass
Metadata:
  Creation Timestamp:  2025-10-22T15:02:58Z
  Finalizers:
    karpenter.k8s.aws/termination
  Generation:        1
  Resource Version:  73323969
  UID:               25c663e7-cc29-47b2-8a97-937fb5f39825
Spec:
  Ami Family:  AL2023
  Ami Selector Terms:
    Alias:              al2023@latest
  Detailed Monitoring:  true
  Metadata Options:
    Http Endpoint:                enabled
    httpProtocolIPv6:             disabled
    Http Put Response Hop Limit:  1
    Http Tokens:                  required
  Role:                           KarpenterNodeRole-mycorp-prod-clickhouse-k8s
  Security Group Selector Terms:
    Tags:
      karpenter.sh/discovery/mycorp-prod-clickhouse-k8s:  true
  Subnet Selector Terms:
    Tags:
      karpenter.sh/discovery:  true
      private_subnet:          true
  Tags:
    Name:                                              mycorp-prod-clickhouse-k8s-karpenter-clickhouse
    karpenter.sh/discovery/mycorp-prod-clickhouse-k8s:  true
Status:
  Amis:
    Id:    ami-06ab427136b8ffa61
    Name:  amazon-eks-node-al2023-x86_64-nvidia-1.33-v20260304
    Requirements:
      Key:       kubernetes.io/arch
      Operator:  In
      Values:
        amd64
      Key:       karpenter.k8s.aws/instance-gpu-count
      Operator:  Exists
    Id:          ami-08f492a005f7b8703
    Name:        amazon-eks-node-al2023-x86_64-neuron-1.33-v20260304
    Requirements:
      Key:       kubernetes.io/arch
      Operator:  In
      Values:
        amd64
      Key:       karpenter.k8s.aws/instance-accelerator-count
      Operator:  Exists
    Id:          ami-0023c4931d42779e6
    Name:        amazon-eks-node-al2023-x86_64-standard-1.33-v20260304
    Requirements:
      Key:       kubernetes.io/arch
      Operator:  In
      Values:
        amd64
      Key:       karpenter.k8s.aws/instance-gpu-count
      Operator:  DoesNotExist
      Key:       karpenter.k8s.aws/instance-accelerator-count
      Operator:  DoesNotExist
    Id:          ami-061bed77c8a6d03cd
    Name:        amazon-eks-node-al2023-arm64-standard-1.33-v20260304
    Requirements:
      Key:       kubernetes.io/arch
      Operator:  In
      Values:
        arm64
      Key:       karpenter.k8s.aws/instance-gpu-count
      Operator:  DoesNotExist
      Key:       karpenter.k8s.aws/instance-accelerator-count
      Operator:  DoesNotExist
  Conditions:
    Last Transition Time:  2025-10-22T15:02:59Z
    Message:               
    Observed Generation:   1
    Reason:                AMIsReady
    Status:                True
    Type:                  AMIsReady
    Last Transition Time:  2025-10-22T15:02:59Z
    Message:               
    Observed Generation:   1
    Reason:                SubnetsReady
    Status:                True
    Type:                  SubnetsReady
    Last Transition Time:  2025-10-22T15:02:59Z
    Message:               
    Observed Generation:   1
    Reason:                SecurityGroupsReady
    Status:                True
    Type:                  SecurityGroupsReady
    Last Transition Time:  2025-10-22T15:02:59Z
    Message:               
    Observed Generation:   1
    Reason:                InstanceProfileReady
    Status:                True
    Type:                  InstanceProfileReady
    Last Transition Time:  2025-10-22T15:03:07Z
    Message:               
    Observed Generation:   1
    Reason:                ValidationSucceeded
    Status:                True
    Type:                  ValidationSucceeded
    Last Transition Time:  2025-10-22T15:03:07Z
    Message:               
    Observed Generation:   1
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Instance Profile:        mycorp-prod-clickhouse-k8s_15693974848685646064
  Security Groups:
    Id:    sg-09f3cd41bcef827c0
    Name:  mycorp-prod-clickhouse-k8s-node-20251020164545608400000006
  Subnets:
    Id:       subnet-04xxxxxxxxxx5d30b
    Zone:     us-east-1b
    Zone ID:  use1-az2
    Id:       subnet-00xxxxxxxxxx08cef
    Zone:     us-east-1c
    Zone ID:  use1-az3
    Id:       subnet-02xxxxxxxxxxx8711
    Zone:     us-east-1a
    Zone ID:  use1-az1
Events:       <none>


NodeClaim


In Karpenter, a NodeClaim is the Custom Resource (CR) that represents a single, specific instance of compute capacity. 

While a NodePool is the template and a NodeClass is the blueprint, the NodeClaim is the actual request sent to the cloud provider to launch a specific node. 

Key Characteristics


  • 1:1 Relationship: Each NodeClaim typically corresponds to exactly one EC2 instance and its associated Kubernetes Node.
  • Immutable: Once created, a NodeClaim cannot be changed. If the requirements for a node change (e.g., due to "drift"), Karpenter deletes the existing NodeClaim and creates a new one.
  • Lifecycle Management: It tracks the instance from its initial "launch" request through "registration" with the cluster until it is fully "initialized" and ready to run pods. 

Why We Should Monitor NodeClaims


NodeClaims are the best place to look when debugging provisioning failures. We can use them to identify why a node failed to join the cluster: 
  • Status Conditions: A NodeClaim status will show if an instance failed to launch (e.g., "LaunchFailed" due to AWS service limits) or if it's stuck during initialization.
  • Provider Information: It stores the specific providerID (e.g., EC2 instance ID) and the nodeName once the instance successfully joins the cluster.
  • Drift Detection: Karpenter adds a Drifted status to a NodeClaim if the underlying instance no longer matches the current NodePool or EC2NodeClass configuration. 

How it Fits the Hierarchy:

  • NodePoolDefines 
    • Function: scheduling rules and resource limits.
    • Analogy: The Foreman
  • EC2NodeClass
    • Function: Defines AWS-specific infrastructure (subnets, AMIs).
    • Analogy: The Blueprint
  • NodeClaim
    • Function: Represents the actual instance being provisioned.
    • Analogy: The Building Material

Useful Commands:


To see all active node claims and their status:

kubectl get nodeclaims 

Cluster user needs to have permission to list resource "nodeclaims" in API group "karpenter.sh" at the cluster scope.

To debug a specific node that isn't coming online:

kubectl describe nodeclaim <nodeclaim-name>

Cluster user needs to have permission to get resource "nodeclaims" in API group "karpenter.sh" at the cluster scope.

Example:

% kubectl get nodeclaims         
No resources found

How to Debug Misfunctioning Karpenter


Based on the output we've provided above, our NodePool and EC2NodeClass are both in a Ready state and have correctly discovered subnets, security groups, and AMIs. However, we have 0 nodes and 0 NodeClaims. The most common reason for this behavior—where the infrastructure is ready but Karpenter isn't scaling—is a scheduling mismatch between our pending Pods and the NodePool's requirements. Here is a breakdown of why Karpenter might be sitting idle:

1. Missing Architecture Requirement

Our clickhouse NodePool specifies r8g instance types. These are AWS Graviton (ARM64) instances. In our NodePool spec, we haven't explicitly defined the architecture requirement. While Karpenter is smart, if our Pods do not have a nodeSelector or toleration for kubernetes.io/arch: arm64, or if the Pods are explicitly asking for amd64, Karpenter will not use this NodePool to satisfy them.Check our Pods:

kubectl get pods -A | grep -i pending
kubectl describe pod <pending-pod-name>

Look for NodeSelectors or Affinity rules in the Pod spec. If the Pod doesn't allow arm64, Karpenter won't spin up an r8g node.

2. Missing Taints/Tolerations

ClickHouse workloads often use specialized nodes. If our NodePool (or the nodes it creates) has a Taint that we haven't shown in the snippet, or if our Pods have a Toleration for a taint that doesn't exist on this NodePool, Karpenter will ignore it.

Check if our pending Pods have specific nodeSelector labels that aren't present in the NodePool template.metadata.labels. Our NodePool only has one label: Karpenter - Node - Pool: clickhouse.

3. The "Karpenter Controller" Logs

If the logic seems correct but nothing is happening, the answer is always in the controller logs. Karpenter will explicitly tell us why it is passing over a Pod.Run this to see the scheduling decisions:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter | grep -i "scheduling"

Look for messages like:no reachable nodeclassesno possible pod inventoryunschedulable, ... did not match requirements

4. Service Linked Role / Permissions

Since our EC2NodeClass is Ready, our basic AWS tags are likely fine. However, double-check that the KarpenterNodeRole-geeiq-prod-clickhouse-k8s actually exists in IAM and has the AmazonEKSWorkerNodePolicy and AmazonEC2ContainerRegistryReadOnly attached. If the role is missing or misconfigured, the EC2 instance might start but fail to join the cluster, causing Karpenter to terminate it immediately.

Summary Checklist

Potential Issue Fix/Action 
Arch Mismatch => Add kubernetes.io/arch with arm64 to NodePool requirements or Pod nodeSelector.

Pending Pods => Ensure there are actually Pods in Pending state. Karpenter only scales in response to unschedulable pods.

Instance Availability => r8g instances are relatively new. Ensure they are available in us-east-2 for the capacity type (Spot/On-Demand) we requested.


How to install Karpenter in the cluster?


Namespace


Current Karpenter best practices recommend deploying it in the kube-system namespace rather than its own dedicated namespace. 

While many early adopters used a separate karpenter namespace, the project shifted toward kube-system starting with version v0.33.0. 

Why kube-system is preferred:
  • API Priority & Fairness: By default, Kubernetes grants higher priority to requests coming from the kube-system namespace. This ensures the Karpenter controller can still communicate with the API server to provision nodes even during periods of heavy cluster congestion.
  • Critical Component Status: Placing Karpenter in kube-system denotes it as a critical cluster component, aligning it with other essential services like kube-proxy or the VPC CNI.
  • Reduced Complexity: Using a standard system namespace avoids the need to manually configure custom FlowSchemas or PriorityLevelConfigurations that would otherwise be required to give a custom namespace the same level of reliability. 

When to use a separate namespace:
  • Legacy Installations: If we installed Karpenter before v0.33.0, it likely lives in a karpenter namespace. Migrating is recommended but requires updating our IAM Roles for Service Accounts (IRSA) trust policy to reflect the new namespace.
  • Fargate Isolation: If we run Karpenter on AWS Fargate, we must create a Fargate profile for the specific namespace where Karpenter is deployed.

Labels


While most standard Karpenter installations use the label app.kubernetes.io/name=karpenter for the controller pods, it is not guaranteed for every pod in every environment. 

Why it might differ:
  • Helm Chart Customisation: If we (or our platform team) overrode the podLabels or nameOverride values during the Karpenter Helm installation, this label will be different.
  • Version Variance: Very old versions of Karpenter sometimes used different labelling conventions (e.g., just app=karpenter), though modern versions follow Kubernetes recommended labels.
  • Webhook Pods: In some configurations, Karpenter may run separate pods for webhooks that might carry slightly different descriptive labels depending on the deployment strategy.


How to configure Karpenter?


We can configure specific Karpenter NodePools or Provisioners.

How to set up weighted NodePools for multi-tenant isolation?



In Karpenter, Weighted NodePools allow us to control which NodePool is selected when a pod's requirements match multiple pools. This is a powerful tool for multi-tenant isolation, enabling us to prioritize specific hardware or cost models for certain teams while providing a fallback mechanism. 

How Weighting Works

  • Precedence: Higher weight values indicate higher priority.
  • Default: If no weight is specified, it defaults to 0.
  • Selection: If a pending pod matches the requirements of multiple NodePools, Karpenter will always select the one with the highest weight first. 

Multi-Tenant Strategy: Isolation & Priority

For multi-tenant environments, we can use weights to enforce distinct tiers of service or cost:
  • Reserved/Savings Plan Tier (Highest Weight): 
    • Create a NodePool that specifically includes instance types covered by our Savings Plans or Reserved Instances. By giving this pool a high weight (e.g., 100), Karpenter will prioritize using this pre-paid capacity before launching new nodes.
  • Spot Instance Tier (Medium Weight): 
    • A general-purpose pool for non-critical workloads or "Team A" can be set with a medium weight (e.g., 50) and restricted to spot capacity.
  • On-Demand Fallback (Lowest Weight): 
    • A "catch-all" NodePool with a low weight (e.g., 10) that allows on-demand instances. This ensures that if Spot capacity is unavailable or Savings Plans are exhausted, workloads still have a place to land. 

Implementation Example

Below is an example of two overlapping NodePools where the "Premium" pool is prioritized for any workload that could run on it.

# NodePool 1: High Priority (e.g., Reserved Capacity)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: premium-reserved
spec:
  weight: 100  # Higher weight = Higher priority
  template:
    spec:
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["m5.large", "m5.xlarge"] # Specific reserved types
      nodeClassRef:
        name: default
---
# NodePool 2: Standard Priority (e.g., Spot)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: standard-spot
spec:
  weight: 50
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot"]
      nodeClassRef:
        name: default


Best Practices for Isolation

  • Mutual Exclusivity: While weights handle overlaps, the official Karpenter guidance suggests making NodePools mutually exclusive whenever possible (using taints/tolerations or unique labels) to simplify debugging.
  • Resource Limits: Always set spec.limits on tenant-specific pools to prevent one team from consuming the entire cluster's budget.
  • Billing Attribution: Use the spec.template.metadata.labels field in each NodePool to add "Team" or "Project" tags. These labels propagate to the EC2 instances, making it easy to track costs per tenant

How to implement Taints and Tolerations alongside weights for stricter tenant "hard" isolation?


While weights allow Karpenter to prefer one NodePool over another, Taints and Tolerations are required for hard isolation. They ensure that nodes provisioned for one tenant "repel" pods from all other tenants. 

The Isolation Strategy

To achieve strict tenant separation, we combine three elements:
  • Taints: Applied to the NodePool to prevent unauthorized pods from scheduling on its nodes.
  • Tolerations: Applied to the tenant's pods so they can "bypass" the taint.
  • Node Affinity: Applied to the tenant's pods to "attract" them specifically to their dedicated nodes. 

1. Dedicated Tenant NodePool 

In the NodePool spec, add a taint. Any node Karpenter creates from this pool will automatically carry this "keep out" sign. 

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: tenant-a-pool
spec:
  weight: 50
  template:
    spec:
      taints:
        - key: "tenant"
          value: "team-a"
          effect: "NoSchedule" # Only pods with matching toleration can land here
      labels:
        tenant: "team-a" # Used for affinity
      nodeClassRef:
        name: default

2. Tenant Pod Configuration

For Team A's workloads to run, their pods must explicitly tolerate the taint and prefer (or require) the tenant label. 

apiVersion: v1
kind: Pod
metadata:
  name: team-a-app
spec:
  containers:
    - name: app
      image: nginx
  tolerations:
    - key: "tenant"
      operator: "Equal"
      value: "team-a"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: "tenant"
                operator: In
                values: ["team-a"]

Why use both?

  • Taint + Toleration alone stops other pods from accidentally using Team A's nodes, but it doesn't stop Team A's pods from accidentally landing on "General" nodes.
  • Node Affinity ensures Team A's pods only go to their dedicated nodes.
  • Weights (e.g., weight: 100) can still be used within a tenant's pool to prioritize Spot vs. On-Demand specifically for that tenant. 

Best Practices

  • Mutually Exclusive Pools: It is recommended to design NodePools so they do not overlap. If a pod matches multiple pools, Karpenter uses the one with the highest weight.
  • NoExecute for Critical Changes: Use the NoExecute effect if we need to evict existing pods immediately when a node becomes inappropriate for them.
  • Limit Resources: Always set spec.limits on each tenant pool to prevent a single team's auto-scaling from exhausting the entire AWS account's resources.

How to ensure our cluster has at least 3 nodes spread across 3 different Availability Zones (AZs)?


This is important if we want to implement highly available architecture. We want nodes to be spread across multiple data centres and with them, pod which belong to our application. 

We can define a NodePool that forces a spread across zones using topology:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        # Force the nodes to be spread across these zones
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-east-1a", "us-east-1b", "us-east-1c"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand"]
  # Ensure the autoscaler keeps a minimum of 3 nodes
  limits:
    cpu: 1000


BONUS: Forcing Pods to use all 3 Zones


Even if we have 3 nodes in 3 zones, Kubernetes might try to put all our pods on just one of those nodes to be "efficient." 

To prevent this, we use Topology Spread Constraints. This is the modern, more powerful version of Anti-Affinity. It ensures our pods are distributed evenly across the zones we just created.

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: "topology.kubernetes.io/zone"
    whenUnsatisfiable: DoNotSchedule # Or ScheduleAnyway
    labelSelector:
      matchLabels:
        app: my-app

maxSkew: 1: This means the difference in the number of pods between any two zones can't be more than 1. (e.g., 1-1-1 is fine, 2-1-0 is not).


How to check if Karpenter is deployed and operational in the cluster?

To verify that Karpenter is correctly configured and operational in our EKS cluster, we should follow validation steps described below.

1. Check Controller Health


a) Check Pod Status


Ensure the Karpenter controller pods are running without errors in the dedicated namespace (usually kube-system or karpenter).

We know that its pods should be installed in kube-system namespace and that they should have label app.kubernetes.io/name=karpenter so we can filter pods by these two criterias:

% kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
NAME                         READY   STATUS    RESTARTS   AGE
karpenter-598976645b-96dps   1/1     Running   0          11h
karpenter-598976645b-nxm24   1/1     Running   0          12h

b) Inspect Logs


To watch for successful discovery of our cluster endpoint and region use:

% kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

-f = follow (command does not return)
- l = logs from objects with specified label
-c = only logs from specified container


To verify successful discovery of our EKS cluster endpoint and region, we should look for specific initialisation and informer messages in the Karpenter controller logs. 

Key Success Indicators

When Karpenter starts, it must connect to the AWS EKS API to "describe" the cluster. Look for these signs in the output of kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller: 

  • "Starting informers...": This indicates Karpenter has successfully authenticated with the Kubernetes API server and is beginning to watch for unschedulable pods.
  • Absence of "DescribeCluster" Errors: If discovery is working, we will not see errors like failed to detect the cluster CIDR or AccessDeniedException: ... eks:DescribeCluster.
  • Region and Cluster Verification: In newer versions, Karpenter logs its configuration during startup. Look for a log entry mentioning the cluster name and AWS region we provided in our Helm values. 
Example log:

{"level":"DEBUG","time":"2026-03-11T01:10:28.203Z","logger":"controller","caller":"operator/operator.go:132","message":"discovered karpenter version","commit":"1c39126","version":"1.3.2"}

{"level":"DEBUG","time":"2026-03-11T01:10:28.461Z","logger":"controller","caller":"operator/operator.go:124","message":"discovered region","commit":"1c39126","region":"us-east-1"}

{"level":"DEBUG","time":"2026-03-11T01:10:28.749Z","logger":"controller","caller":"operator/operator.go:129","message":"discovered region","commit":"1c39126","region":"us-east-1"}

{"level":"DEBUG","time":"2026-03-11T01:10:28.909Z","logger":"controller","caller":"operator/operator.go:135","message":"discovered cluster endpoint","commit":"1c39126","cluster-endpoint":"https://CA0xxxxxxx5FDD.yxx.us-east-1.eks.amazonaws.com"}

{"level":"DEBUG","time":"2026-03-11T01:10:28.914Z","logger":"controller","caller":"operator/operator.go:143","message":"discovered kube dns","commit":"1c39126","kube-dns-ip":"172.20.0.10"}

{"level":"INFO","time":"2026-03-11T01:10:28.948Z","logger":"controller.controller-runtime.metrics","caller":"server/server.go:208","message":"Starting metrics server","commit":"1c39126"}

{"level":"INFO","time":"2026-03-11T01:10:28.948Z","logger":"controller","caller":"manager/runnable_group.go:226","message":"starting server","commit":"1c39126","name":"health probe","addr":"[::]:8081"}

{"level":"INFO","time":"2026-03-11T01:10:28.950Z","logger":"controller.controller-runtime.metrics","caller":"server/server.go:247","message":"Serving metrics server","commit":"1c39126","bindAddress":":8080","secure":true}

{"level":"INFO","time":"2026-03-11T01:10:29.052Z","logger":"controller","caller":"leaderelection/leaderelection.go:215","message":"attempting to acquire leader lease kube-system/karpenter-leader-election...","commit":"1c39126"}

{"level":"DEBUG","time":"2026-03-11T06:00:19.215Z","logger":"controller","caller":"provisioning/provisioner.go:128","message":"computing scheduling decision for provisionable pod(s)","commit":"1c39126","controller":"provisioner","namespace":"","name":"","reconcileID":"921af0a4-f057-4041-bff5-d1861d9f72d1","pending-pods":1,"deleting-pods":0}

{"level":"DEBUG","time":"2026-03-11T06:00:21.223Z","logger":"controller","caller":"provisioning/provisioner.go:128","message":"computing scheduling decision for provisionable pod(s)","commit":"1c39126","controller":"provisioner","namespace":"","name":"","reconcileID":"9f0c7833-8e01-4661-8728-890f0001a634","pending-pods":1,"deleting-pods":0}

{"level":"INFO","time":"2026-03-11T06:00:29.230Z","logger":"controller","caller":"lifecycle/controller.go:148","message":"initialized nodeclaim","commit":"1c39126","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"xxxx-ms587"},"namespace":"","name":"xxxxx","reconcileID":"35624d4f-833a-4939-9785-24df4c975e0e","provider-id":"aws:///us-east-1c/i-0123456df20484e26","Node":
{"name":"ip-10-1-46-231.us-east-1.compute.internal"},"allocatable":{"cpu":"3920m","ephemeral-storage":"192128045146","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"15147932Ki","pods":"58"}}

{"level":"DEBUG","time":"2026-03-11T06:00:29.741Z","logger":"controller","caller":"disruption/controller.go:99","message":"marking consolidatable","commit":"1c39126","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"xxxx-ms587"},"namespace":"","name":"xxxx-ms587","reconcileID":"8c5c3d20-36eb-4a78-b0e8-792532db530d","Node":{"name":"ip-10-2-45-230.us-east-1.compute.internal"}}

{"level":"INFO","time":"2026-03-11T06:01:46.399Z","logger":"controller","caller":"disruption/controller.go:193","message":"disrupting node(s)","commit":"1c39126","controller":"disruption","namespace":"","name":"","reconcileID":"acc96c52-0cda-475f-b8a9-1251e7a98dc1","command-id":"3fa9d95e-8f45-48a9-b524-94786e1ac91a","reason":"empty","decision":"delete","disrupted-node-count":1,"replacement-node-count":0,"pod-count":0,"disrupted-nodes":[{"Node":{"name":"ip-10-2-45-230.us-east-1.compute.internal"},"NodeClaim":{"name":"xxxx-ms587"},"capacity-type":"on-demand","instance-type":"m5.xlarge"}],"replacement-nodes":[]}


Common Error Patterns to Watch For

If discovery fails, the logs will explicitly mention connectivity or permission issues:
  • DNS/Endpoint Issues: Look for i/o timeout or lookup sts.<region>.amazonaws.com. This often means Karpenter can't reach the AWS STS endpoint to get credentials.
  • IAM Permission Issues: Messages stating is not authorized to perform: eks:DescribeCluster mean the controller's IAM role (IRSA) is missing the necessary permissions to discover the cluster details.
  • Controller Crash/Restart: If the logs show repeated restarts right after "Starting informers", it often points to a mismatch between the provided clusterName and the actual cluster. 

Tip: Enable Debug Logging 

If we don't see enough detail, we can increase the log verbosity. Update our Helm deployment with --set logLevel=debug or change the LOG_LEVEL environment variable in the deployment to debug


2. Verify CRD Configurations 


Karpenter requires specific Custom Resource Definitions (CRDs) to know how to provision nodes. 

(1) List NodePools: Run kubectl get nodepools to ensure our provisioning logic is active.



(2) List EC2NodeClasses: Run kubectl get ec2nodeclasses to confirm AWS-specific settings (like subnets and security groups) are defined. 


3. Perform a Scaling Test ("Inflate" Test) 


The standard way to test Karpenter is by deploying a "dummy" workload that exceeds current cluster capacity. 

(1) Deploy a test app: Apply a deployment (often called inflate) with high CPU/Memory requests.

(2) Scale it up: Run: 

% kubectl scale deployment inflate --replicas=5

(3) Watch for new nodes: Monitor:

% kubectl get nodes -w

If configured correctly, Karpenter will detect the pending pods and provision a new EC2 instance within about a minute. 

During the inflate scaling test, how to know that a new node was provisioned by karpenter and not cluster autoscaler?


During an inflate scaling test, we can distinguish between nodes provisioned by Karpenter and those from Cluster Autoscaler (CAS) by checking for specific labels, console status, and controller logs. 

1. Check for Specific Kubernetes Labels 

Karpenter automatically injects unique labels into every node it creates. CAS nodes usually belong to an Auto Scaling Group (ASG) and do not have these specific Karpenter markers. 

Run this command to see the labels on our nodes:

kubectl get nodes --show-labels 

Look for these Karpenter-exclusive labels:
  • karpenter.sh/nodepool: The name of the NodePool that provisioned the node.
  • karpenter.sh/capacity-type: Set to spot or on-demand.
  • karpenter.k8s.aws/instance-category: (e.g., c, m, r). 

Nodes provisioned by Cluster Autoscaler (CAS) don't have a unique "CAS" label. Instead, they carry labels that identify them as members of an Auto Scaling Group (ASG) or an EKS Managed Node Group (MNG).

If we are looking at a node and trying to confirm if it came from CAS, look for these specific markers:

1. Managed Node Group Labels (Most Common)

If we use EKS Managed Node Groups with CAS, the nodes will always have:
  • eks.amazonaws.com/nodegroup: The name of the MNG.
  • eks.amazonaws.com/nodegroup-image: The AMI ID used.
  • eks.amazonaws.com/capacityType: Usually ON_DEMAND or SPOT
  • eks.amazonaws.com/sourceLaunchTemplateId
  • eks.amazonaws.com/sourceLaunchTemplateVersion

2. Auto Scaling Group Labels

Since CAS works by increasing the "Desired Capacity" of an ASG, the underlying EC2 instance is tagged by AWS. Kubernetes reflects these as:
  • alpha.eksctl.io/nodegroup-name: (If using eksctl)
  • node.kubernetes.io/instance-type: (Standard, but CAS uses this to match ASG definitions)

3. The "Missing" Labels

The easiest way to identify a CAS node during a Karpenter test is by what it doesn't have. A CAS node will NEVER have:
  • ❌ karpenter.sh/nodepool
  • karpenter.sh/provisioner-name (deprecated)
  • karpenter.k8s.aws/instance-category

Quick Check Command

Run this to see which nodes belong to Karpenter vs. CAS/MNG:

kubectl get nodes -L karpenter.sh/nodepool,://eks.amazonaws.com

If the nodepool column is populated, it's Karpenter.
If the nodegroup column is populated, it's CAS/MNG.


2. Identify via "Self-Managed" Status in EKS Console 

In the AWS EKS Console under the Compute tab: 
  • Karpenter Nodes: Appear as "Self-managed" because Karpenter bypasses Auto Scaling Groups to launch instances directly via the EC2 Fleet API.
  • Cluster Autoscaler Nodes: Appear as part of a "Managed Node Group" or are tied to a specific ASG. 

3. Check for the NodeClaim Object 

Karpenter creates a NodeClaim for every node it provisions. Cluster Autoscaler does not use this resource. Run

kubectl get nodeclaims 

...during the test. If we see new entries appearing that correspond to our inflate pods, Karpenter is doing the work. 

4. Monitor Controller Logs

We can watch Karpenter’s real-time decision-making process by tailing its logs. It will explicitly state when it discovers unschedulable pods and which instance type it is launching. 

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f

CAS logs, by contrast, will show it interacting with ASGs and increasing the "desired capacity" of a group


How to disable Cluster Autoscaler temporarily to ensure Karpenter is the only one responding to our tests?


To ensure Karpenter is the only controller responding to our scaling tests, we can temporarily disable the Cluster Autoscaler (CAS) by scaling its deployment to zero replicas. 

1. Identify the CAS Deployment

The Cluster Autoscaler typically runs in the kube-system namespace. Verify its name first: 

kubectl get deployments -n kube-system | grep cluster-autoscaler

2. Scale to Zero

Run the following command to stop the CAS from running. This will terminate the pod responsible for monitoring the cluster and scaling our Auto Scaling Groups (ASGs): 

kubectl scale deployment cluster-autoscaler -n kube-system --replicas=0


3. Verify the Shutdown

Ensure no CAS pods are running to prevent them from interfering with our inflate test:

kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler

4. (Optional) Remove ASG Tags

If we want a more permanent "hard" disable without deleting the deployment, we can remove the specific AWS tags from our Auto Scaling Groups that the CAS uses for auto-discovery:
  • k8s.io/cluster-autoscaler/enabled
  • k8s.io/cluster-autoscaler/<cluster-name> 

Without these tags, the CAS will ignore those node groups even if the deployment is scaled back up. 

To Re-enable

Once our tests are complete, we can restore the Cluster Autoscaler by scaling it back to its original replica count:

kubectl scale deployment cluster-autoscaler -n kube-system --replicas=1


To confirm which instance types Karpenter chose during our inflate test, we can watch the controller logs in real-time. Karpenter will log exactly how it batches our pods and which instances it requests from AWS.

1. Tail Karpenter Logs

Run the following command while our inflate pods are in a Pending state:

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter

Note: Some installations use the karpenter namespace instead of kube-system. 

2. What to Look For

Karpenter logs its decisions in JSON or text format. Look for INFO messages containing found provisionable pod(s) or created nodeclaim.

A typical log entry looks like this:

2024-03-12T10:00:00.000Z INFO controller.provisioner created nodeclaim {"commit": "...", "nodeclaim": "default-abc12", "nodepool": "general-purpose", "requests": {"cpu":"4","memory":"8Gi"}, "instance-types": "m5.xlarge, m6i.xlarge, c5.2xlarge..."}

  • requests: Shows the total CPU/Memory requested by our inflate pods.
  • instance-types: Lists the candidates Karpenter passed to the EC2 Fleet API. Karpenter usually sends a diversified list (up to 60 types) to ensure high availability and best pricing. 

3. Identify the Winning Instance

Once the node is live, we can confirm the final selected type by checking the NodeClaim status:

kubectl get nodeclaims -o custom-columns=NAME:.metadata.name,TYPE:.status.instanceType,ZONE:.status.zone

This command explicitly shows the specific instance type (e.g., m5.large) that AWS actually provisioned for that claim. 

4. Enable Debug Logging (Optional) 

If we don't see enough detail, we can increase the log verbosity to debug. This will reveal exactly how Karpenter "binpacks" our pods onto different instance options. 

kubectl patch deployment karpenter -n kube-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/env/0/value", "value": "debug"}]'

Tip: After our test, remember to set the LOG_LEVEL back to info to avoid excessive log storage costs.

4. Monitor NodeClaims


Karpenter uses NodeClaims to manage the lifecycle of the nodes it creates. 
  • Check Status: Run kubectl get nodeclaims to see if requests for new nodes are being fulfilled.
  • Debug Failures: If a node fails to join, use kubectl describe nodeclaim <name> to check for errors like LaunchFailed or issues with IAM permissions. 

5. Verify Tagging


Karpenter discovers infrastructure via tags. Ensure our Subnets and Security Groups have the following tag: 
Key: karpenter.sh/discovery
Value: <our-cluster-name>


How to know if node was provisioned by Karpenter?


Karpenter applies labels on nodes it provisions so let's check labels:

% kubectl get nodes --show-labels

If labels like karpenter.sh/nodepool or karpenter.sh/provisioner-name exist, Karpenter launched the node.


When setting up EKS Managed Node Group, we set desired, min and max number of nodes for ASG. Does that enable Cluster Autoscaler automatically? How does ASG play with Karpenter?


Setting the desired, min, and max size on an EKS Managed Node Group only configures the underlying AWS Auto Scaling Group (ASG).
  • What AWS does: If a node crashes, the ASG will see that the "current" count is less than the "min" (or "desired") and spin up a new node to replace it.
  • What AWS does NOT do: It will not look at your pending Kubernetes pods and say, "Oh, we need more space, let's increase the count from 3 to 4."
To get that "intelligent" scaling based on pod demand, we must install a separate controller.

Is Cluster Autoscaler (CAS) enabled by default?

No. Kubernetes Cluster Autoscaler is not enabled by default on EKS.

If we want to use it, we must:
  • Deploy the Cluster Autoscaler as a Pod in our cluster (usually via Helm).
  • Give that Pod an IAM Role (IRSA) that has permission to update your ASG's desired_capacity.
  • Add specific tags to our Node Group so the Autoscaler knows which ASG to "manage."

Do we need to disable CAS to use Karpenter?


Yes, absolutely. We should not run Cluster Autoscaler and Karpenter simultaneously on the same nodes.
  • The Conflict: CAS tries to scale nodes by changing the "desired capacity" of an ASG. Karpenter works differently—it bypasses ASGs entirely and talks directly to the EC2 Fleet API to launch specific instances.
  • The Result of Running Both: They will fight over the cluster. CAS might try to shrink a group while Karpenter is trying to add capacity, leading to "flapping" nodes and unpredictable costs.

If we switch to Karpenter:
  • Uninstall/Scale down the Cluster Autoscaler deployment.
  • Set our Node Group sizes to fixed values (or migrate to "headless" node groups where Karpenter manages the entire lifecycle).
  • Karpenter is the "New Way": Most AWS users are moving toward Karpenter because it is faster (seconds vs minutes) and more efficient at picking the right instance sizes.

Summary Comparison


Feature   ASG (Default)             Cluster Autoscaler (CAS)              Karpenter
---------   ------------------             -------------------------------              ------------
Logic      "Keep X nodes alive"  "Add nodes if Pods are Pending"  "Provision exactly what Pods need"
Speed      Slow (Health-based)   Medium (Polling ASG)                  Fast (Direct EC2 API)
Setup      Built-in to EKS            Manual Install + IAM                    Manual Install + IAM
Best for   Fixed capacity             Traditional scaling                         Cost-optimization & high speed


Updating Kubernetes version on nodes managed by Karpenter



References: