Friday, 20 February 2026

Grafana Observability Stack

 




Grafana uses these components together as an observability stack, but each has a clear role:


Loki – log database. It stores and indexes logs (especially from Kubernetes) in a cost‑efficient, label‑based way, similar to Prometheus but for logs.

Tempo – distributed tracing backend. It stores distributed traces (spans) from OpenTelemetry, Jaeger, Zipkin, etc., so you can see call flows across microservices and where latency comes from.

Mimir – Prometheus‑compatible metrics backend. It is a horizontally scalable, long‑term storage and query engine for Prometheus‑style metrics (time series).

Alloy – telemetry pipeline (collector). It is Grafana’s distribution of the OpenTelemetry Collector / Prometheus agent / Promtail ideas, used to collect, process, and forward metrics, logs, traces, profiles into Loki/Tempo/Mimir (or other backends).


How Grafana UI relates to them


Grafana UI itself is “just” the visualization and alerting layer:

  • It connects to Loki, Tempo, Mimir (and many others) as data sources.
  • For each backend you configure:
    • A Loki data source for logs.
    • A Tempo data source for traces.
    • A Prometheus/Mimir data source for metrics (Mimir exposes a Prometheus‑compatible API).
  • Grafana then lets you:
    • Build dashboards and alerts from Mimir metrics.
    • Explore logs from Loki.
    • Explore traces from Tempo and cross‑link them with logs/metrics (e.g., click from a log line to a trace, or from a metrics graph into logs/traces).

A useful mental model: Loki/Tempo/Mimir are databases, Alloy is the collector/router, and Grafana is the UI on top.


Are they deployed in the same Kubernetes cluster?


Common patterns:

  • Very common: deploy Loki, Tempo, Mimir, Alloy, and Grafana in the same Kubernetes cluster as your apps. This is the typical “in‑cluster LGTM” setup; all telemetry stays inside the cluster and traffic is simple.
  • Also common: run them in a separate observability cluster (or use Grafana Cloud backends), while Alloy/agents run in each workload cluster and ship data over the network. This improves isolation and makes it easier to share one observability stack across many clusters.
  • In smaller setups or dev environments, everything (apps + LGTM + Grafana) often lives in one cluster; in larger/regulated setups, people tend to separate “workload clusters” and an “observability cluster”.

So: they don’t have to be on the same cluster, but it’s perfectly normal (and often simplest) to run Grafana + Loki + Tempo + Mimir + Alloy together in a single Kubernetes cluster and point your apps’ telemetry to Alloy.


Why not using elasticsearch instead of loki, tempo and mimir?


Elasticsearch can replace part of what Loki, Tempo, and Mimir do, but not all of it, and usually with higher cost/complexity for cloud‑native observability.

1. Scope: logs vs full observability


Elasticsearch is a general search and analytics engine that’s great at full‑text search, aggregations, and analytics over documents (including logs).

The LGTM stack is explicitly split by signal:
  • Loki → logs
  • Tempo → traces
  • Mimir → metrics

Each is optimized only for its signal type and integrates tightly with Grafana and modern telemetry standards.

You could plausibly replace Loki with Elasticsearch for logs, but Elasticsearch does not natively replace Tempo (distributed tracing backend) or Mimir (Prometheus‑compatible metrics backend).

2. Logs: Loki vs Elasticsearch


Elasticsearch strengths:
  • Very powerful full‑text search, fuzzy matching, relevance scoring, complex aggregations.
  • Good when you need deep forensic search and advanced analytics on log text.

Loki strengths:
  • Stores logs as compressed chunks plus a small label index, so storage and compute are much cheaper than Elasticsearch for typical Kubernetes logs.
  • Very tight integration with Grafana and the rest of LGTM, and simple, label‑based querying.

Trade‑off: Elasticsearch gives richer search at a high infra + ops cost, Loki gives “good enough” search for operational troubleshooting with much lower cost and operational burden.

3. Traces and metrics: Tempo & Mimir vs “just ES”


Tempo:
  • Implements distributed tracing concepts (spans, traces, service graphs) and OpenTelemetry/Jaeger/Zipkin protocols; the data model and APIs are specialized for traces.
  • Elasticsearch can store trace‑like JSON documents, but you’d have to build/maintain all the trace stitching, UI navigation, and integrations yourself.

Mimir:
  • Is a horizontally scalable, Prometheus‑compatible time‑series database, with native remote‑write/read and PromQL semantics.
  • Elasticsearch can store time‑stamped metrics, but you lose Prometheus compatibility, PromQL semantics, and the whole ecosystem that expects a Prometheus‑style API.

So using only Elasticsearch means you’re giving up the standard metrics and tracing ecosystems and rebuilding a lot of tooling on top of a generic search engine.

4. Cost, complexity, and operational burden


Elasticsearch clusters generally need:
  • More RAM/CPU per node, careful shard and index management, and capacity planning.
  • Storage overhead from full‑text indexes (often 1.5–3× raw log size plus replicas).
Loki/Tempo/Mimir:

  • Are designed for object storage, compression, and label‑only indexing, which dramatically lowers storage and compute requirements for logs and metrics.
  • Have simpler, well‑documented reference architectures specifically for observability.

For a modern Kubernetes‑centric environment, that usually makes LGTM cheaper and easier to run than a single big Elasticsearch cluster for everything.

5. When Elasticsearch still makes sense


You might still choose Elasticsearch (often with Kibana/APM) if:
  • You already have a strong ELK stack and team expertise.
  • Your primary need is deep, flexible text search and analytics over logs, with less emphasis on Prometheus/OTel ecosystems.
  • You want Elasticsearch’s ML/anomaly‑detection features and are willing to pay the operational cost.

But if your goal is a Grafana‑centric, standards‑based (Prometheus + OpenTelemetry) observability platform, LGTM (Loki+Tempo+Mimir, plus Alloy as collector) is a better fit than trying to push everything into Elasticsearch.

---

Here document (heredoc)




Here document (heredoc) redirects a multiline string literal to the preceding command while preserving line breaks. Unix syntax for it is:

[command] <<DELIMITER
    First line.
    Second line.
    Third line.
    Fourth line.
DELIMITER


<< is Redirection Operator
- is optional Tab Suppression
DELIMITER - an arbitrary string, Delimiter Token; must be the same at the beginning and at the end

Appending a minus sign to the redirection operator <<- causes all leading tab characters to be ignored. This allows you to use indentation when writing heredocs in shell scripts. We can then indent both the here-doc and the delimiter with tabs (not spaces!):

#! /bin/bash
cat <<-EOF
    indented
    EOF
echo Done

---

References:



Wednesday, 18 February 2026

How to fix pods in Not Ready state?


kubectl get pods might show that some of the pods have 0/N value in READY column.

What is the meaning of READY column value?


In the context of kubectl get pods, the READY column shows the number of containers in the pod that have passed their health checks and are ready to serve traffic.

The anatomy of R/T:
  • R (Left side): This is the number of containers currently Ready. A 0 means the application inside the container is not responding to its "Readiness Probe" or has not finished starting up.
  • / (Separator): Separates ready containers from the total.
  • T (Right side): This is the Total number of user containers defined in that pod.

When we have a pod in 0/1 state with a status of Running, it means the container has started, but Kubernetes does not consider it "healthy" enough to handle requests. Our pods are technically "alive" (Running) but "unusable" (Not Ready).

Common Reasons for 0/1 Running:

  • Failed Readiness Probe: The application is running, but the health check URL (e.g., /ready) is returning an error or timing out.
  • Slow Startup: The application takes a long time to initialize, and the "Initial Delay" isn't long enough.
  • Dependency Issues: The pod is waiting for a database, a config file, or another service that isn't available.
  • CrashLoopBackOff (Transitions): Sometimes pods flicker between 0/1 Running and 0/1 CrashLoopBackOff as they try to start and immediately fail.

How to find out exactly what's wrong


To diagnose why our my-app pods are stuck, run these two commands:

(1) Check the Events (Why it's not ready):

kubectl describe pod my-app-0 -n my-app-namespace

Look at the "Events" section at the bottom for messages like "Readiness probe failed."

Output example:

Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  47s (x81205 over 8d)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 503


(2) Check the Logs (What the app is saying):

kubectl logs my-app-0 -n my-app-namespace

Look for "Error," "Exception," or "Connection Refused" messages.

---

Tuesday, 17 February 2026

How to use terraform-docs automatically generate Terraform code documentation

 

terraform-docs is a tool used to automatically generate Terraform code documentation.

To install it on Mac:

% brew install terraform-docs 

To verify installation:

% terraform-docs --version                                        
terraform-docs version v0.21.0 darwin/arm64

To generate a documentation for a module in the current directory and append it to the README file (which is in the same directory):

% terraform-docs markdown table --output-file README.md --output-mode inject ./


How to install Terraform on Mac



First add Hashicorp's package repository:

% brew tap hashicorp/tap

Then install the Terraform:

% brew install hashicorp/tap/terraform

If Terraform was already installed, the command above will update it.

To verify installation, we can check its version:

% terraform --version                                                                                    
Terraform v1.14.5
on darwin_arm64

Friday, 6 February 2026

Amazon EKS Autoscaling with Karpenter



Kubernetes autoscaling is a function that scales resources in and out depending on the current workload. AWS supports two autoscaling implementations:
  • Cluster Autoscaler
  • Karpenter 
    • Karpenter
    • flexible, high-performance Kubernetes cluster autoscaler
    • helps improve application availability and cluster efficiency
    • launches right-sized compute resources (for example, Amazon EC2 instances) in response to changing application load in under a minute
    • can provision just-in-time compute resources that precisely meet the requirements of your workload
    • automatically provisions new compute resources based on the specific requirements of cluster workloads. These include compute, storage, acceleration, and scheduling requirements. 
    • creates Kubernetes nodes directly from EC2 instances
    • improves the efficiency and cost of running workloads on the cluster
    • open-source


Pod Scheduler


  • Kubernetes cluster component responsible for determining which node Pods get assigned to
  • default Pod scheduler for Kubernetes is kube-scheduler
    • logs the reasons Pods can't be scheduled

Unschedulable Pods



A Pod is unschedulable when it's been put into Kubernetes' scheduling queue, but can't be deployed to a node. This can be for a number of reasons, including:
  • The cluster not having enough CPU or RAM available to meet the Pod's requirements.
  • Pod affinity or anti-affinity rules preventing it from being deployed to available nodes.
  • Nodes being cordoned due to updates or restarts.
  • The Pod requiring a persistent volume that's unavailable, or bound to an unavailable node.

How to detect unschedulable Pods?

Pods waiting to be scheduled are held in the "Pending" status, but if the Pod can't be scheduled, it will remain in this state. However, Pods that are being deployed normally are also marked as "Pending." The difference comes down to how long a Pod remains in "Pending." 

How to  fix unschedulable Pods? 
There is no single solution for unschedulable Pods as they have many different causes. However, there are a few things you can try depending on the cause. 
  • Enable cluster autoscaling
    • If you're using a managed Kubernetes service like Amazon EKS or Google Kubernetes Engine (GKE), you can very easily take advantage of autoscaling to increase and decrease cluster capacity on-demand. With autoscaling enabled, Kubernetes' Cluster Autoscaler will trigger your provider to add nodes when needed. As long as you've configured your cluster node pool and it hasn't reached its max node limit, your provider will automatically provision a new node and add it to the pool, making it available to the cluster and to your Pods.
  • Increase your node capacity
  • Check your Pod requests
  • Check your affinity and anti-affinity rules 

 

In this article we'll show how to enable cluster autoscaling with Karpenter.


How does the regular Kubernetes Autoscaler work in AWS?


When we create a regular Kubernetes cluster in AWS, each node group is managed by the AWS Auto-scaling group [Auto Scaling groups - Amazon EC2 Auto Scaling]. Cluster native autoscaler adjusts the desired size based on the load in the cluster to fit all unscheduled pods.

HorizontalPodAutoscaler (HPA) [Horizontal Pod Autoscaling | Kubernetes] is built into Kubernetes and it uses metrics like CPU usage, memory usage or custom metrics we can write to decide when to spin up or down additional pods in the node of the cluster. If our app is receiving more traffic, HPA will kick in and provision additional pods. 

VerticalPodAutoscaler (VPA) can also be installed in cluster where it manages the resource (like CPU and memory) allocation to pods that are already running.

What about when there's not enough capacity to schedule any more pods in the node? That's when we'll need an additional node. So we have a pod that needs to be scheduled but we don't know where to put it. We could call AWS API, spin up an additional EC2 node, get added it to our cluster or if we're using managed groups we can use Managed Node Group API, bump up the desired size but easier approach is to use cluster auto-scaler. There is a mature open-source solution called Cluster Auto-Scaler (CAS).

CAS was built to handle hundreds of different comninations of nodes types, zones, purchase options available in AWS. CAS works directly with managed node groups or self-managed managed nodes and auto-scaling groups which are AWS constructs to help us manage nodes. 


What are the issues with the regular Kubernetes Autoscaler?


Let's say CAS is installed on node, in cluster and manages one managed node group (MNG). It's filling up and we have an additional pod that needs to be provisioned so CAS tells MNG to bump up the number of nodes so it spins up another one so pod can now be scheduled. But this is not ideal. We have a single pod in a node, we don't need such a big node. 

This can be solved by creating a different MNG with a smaller instance type and now CAS recognizes that instance and provisions pod on a more appropriately-sized node.

Unfortunately, we might end up with many MNGs, based on requirements which might be a challenge to manage especially when looking best practices in terms of cost efficiency and high availability. 


How does Karpenter work?


Karpenter works differently, It doesn't use MNG or ASGs and manages each node directly. Let's say we have different pods, of different sizes. Let's say that HPA says that we need more of the smaller pods. Karpenter will intelligently pick the right instance type for that workload. If we need to spin up a larger pod it will again pick the right instance type. 

Karpenter picks exactly the right type of node for our workload. 

If we're using spot instances and spot capacity is not available, Karpenter does retries more quickly. Karpenter offers, faster, dynamic, more intelligent compute, using best practices without operational overhead of managing nodes ourselves. 

How to control how Karpenter operates?
There are many dimensions here. We can set constraints on Karpenter to limit the instances type, we can set up taints to isolate workloads to specific types of nodes. Different teams can have isolated access to different pods, one team can access billing pods, another GPU-based instances. 

Workload Consolidation feature: Pods are consolidated into fewer nodes.. let's say we have 3 nodes, two at 70% and one at 20% utilization. Karpenter detects this and will move pods from underutilized node to those two and shut down this now empty node (instances are terminated). This leads to lower costs.

Karpenter is making it easier to use spot and graviton instances which can also lead to lower costs. 

A feature to keep our nodes up to date. ttlSecondsUntilExpired parameter tells Karpenter to terminate nodes after a set amount of time. These nodes will automatically be replaced with new nodes, running the latest AMIs.

Karpenter:
1) lower costs
2) higher application availability 
3) lower operation overhead


Karpenter needs permissions to create EC2 instances in AWS. 

If we use a self-hosted (on bare metal boxes or EC2 instances), self-managed (you have full control over all aspects of Kubernetes) Kubernetes cluster, for example by using kOps (see also Is k8s Kops preferable than eks? : r/kubernetes), we can add additional IAM policies to the existing IAM role attached to Kubernetes nodes. 

If using EKS, the best way to grant access to internal service is with IAM roles for service accounts (IRSA).


How to configure Karpenter?


We can configure specific Karpenter NodePools or Provisioners.


How to know if node was provisioned by Karpenter?


Karpenter applies labels on nodes it provisions so let's check labels:

% kubectl get nodes --show-labels

If labels like karpenter.sh/nodepool or karpenter.sh/provisioner-name exist, Karpenter launched the node.

References:



Kubernetes Cluster Autoscaler


Kubernetes Cluster Autoscaler:
  • Designed to automatically adjust the number of nodes (EC2 instances) in our cluster based on the resource requests of the workloads running in the cluster
  • Kubernetes project, supported on EKS

Key Features:

  • Node Scaling: It adds or removes nodes based on the pending pods that cannot be scheduled due to insufficient resources.
  • Pod Scheduling: Ensures that all pending pods are scheduled by scaling the cluster up.

It works with EKS Managed Node Groups backed by AWS Auto Scaling Groups. In node group, if we provide specific settings (like custom block_device_mappings), EKS creates an EC2 Launch Template under the hood.



How to check if it's installed and enabled?


(1) Cluster Autoscaler usually runs as a Deployment in kube-system namespace so we can look for that deployment:

% kubectl get deployments -n kube-system | grep -i autoscaler   

cluster-autoscaler-aws-cluster-autoscaler   2/2     2            2           296d

We can also list pods directly:

% kubectl get pods -n kube-system | grep -i autoscaler

cluster-autoscaler-aws-cluster-autoscaler-7cbb844455-q2lxv 1/1 Running 0 206d
cluster-autoscaler-aws-cluster-autoscaler-7cbb844455-vhbsw 1/1 Running 0 206d

If we see a pod running, it’s installed.

Typical names:
  • cluster-autoscaler
  • cluster-autoscaler-aws-clustername
  • cluster-autoscaler-eks-...

(2) Inspect the Deployment (confirm it’s enabled & configured)

% kubectl describe deployment cluster-autoscaler -n kube-system

Name:                   cluster-autoscaler-aws-cluster-autoscaler
Namespace:              kube-system
CreationTimestamp:      Wed, 16 Apr 2025 12:25:38 +0100
Labels:                 app.kubernetes.io/instance=cluster-autoscaler
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=aws-cluster-autoscaler
                        helm.sh/chart=cluster-autoscaler-9.46.6
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: cluster-autoscaler
                        meta.helm.sh/release-namespace: kube-system
Selector:               app.kubernetes.io/instance=cluster-autoscaler,app.kubernetes.io/name=aws-cluster-autoscaler
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/instance=cluster-autoscaler
                    app.kubernetes.io/name=aws-cluster-autoscaler
  Service Account:  cluster-autoscaler-aws-cluster-autoscaler
  Containers:
   aws-cluster-autoscaler:
    Image:      registry.k8s.io/autoscaling/cluster-autoscaler:v1.32.0
    Port:       8085/TCP
    Host Port:  0/TCP
    Command:
      ./cluster-autoscaler
      --cloud-provider=aws
      --namespace=kube-system
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mycorp-prod-mycluster
      --logtostderr=true
      --stderrthreshold=info
      --v=4
    Liveness:  http-get http://:8085/health-check delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:     (v1:metadata.namespace)
      SERVICE_ACCOUNT:   (v1:spec.serviceAccountName)
      AWS_REGION:       us-east-1
    Mounts:             <none>
  Volumes:              <none>
  Priority Class Name:  system-cluster-critical
  Node-Selectors:       <none>
  Tolerations:          <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cluster-autoscaler-aws-cluster-autoscaler-7cbb844455 (2/2 replicas created)
Events:          <none>


Key things to look for:
  • Replicas ≥ 1
  • No crash loops
  • Command args like:
    • --cloud-provider=aws
    • --nodes=1:10:nodegroup-name
    • --balance-similar-node-groups

If replicas are 0, it’s installed but effectively disabled.

(3) Check logs (is it actively scaling?)

This confirms it’s working, not just running.

% kubectl logs -n kube-system deployment/cluster-autoscaler


Healthy / active signs:
  • scale up
  • scale down
  • Unschedulable pods
  • Node group ... increase size

Red flags:
  • AccessDenied
  • no node groups found
  • failed to get ASG

(4) Check for unschedulable pods trigger

If CA is working, it reacts to pods stuck in Pending.

% kubectl get pods -A | grep Pending

If pods are pending and CA logs mention them → CA is enabled and reacting.

(5) AWS EKS-specific checks (very common)

a) Check IAM permissions (classic failure mode)

Cluster Autoscaler must run with an IAM role that can talk to ASGs.

% kubectl -n kube-system get sa | grep autoscaler

cluster-autoscaler-aws-cluster-autoscaler     0         296d
horizontal-pod-autoscaler                     0         296d

Let's inspect cluster-autoscaler-aws-cluster-autoscaler service accont:

% kubectl -n kube-system get sa cluster-autoscaler-aws-cluster-autoscaler  -o yaml

apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxxxx:role/mycorp-prod-mycluster-cluster-autoscaler
    meta.helm.sh/release-name: cluster-autoscaler
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2026-04-16T11:25:37Z"
  labels:
    app.kubernetes.io/instance: cluster-autoscaler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: aws-cluster-autoscaler
    helm.sh/chart: cluster-autoscaler-9.46.6
  name: cluster-autoscaler-aws-cluster-autoscaler
  namespace: kube-system
  resourceVersion: "15768"
  uid: 0a7da521-1bf5-5a5f-a155-8801e876ea7b


Look for:

eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/ClusterAutoscalerRole

If missing → CA may exist but cannot scale.

b) Check Auto Scaling Group tags

Your node group ASGs must be tagged:

k8s.io/cluster-autoscaler/enabled = true
k8s.io/cluster-autoscaler/<cluster-name> = owned


Without these → CA runs but does nothing.

(6) Check Helm (if installed via Helm)

% helm list -A
NAME                  NAMESPACE      REVISION UPDATED      
cluster-autoscaler    kube-system    1           2025-04-16 12:25:30.389073326 +0100BST

STATUS   CHART                                APP VERSION
deployed cluster-autoscaler-9.46.6          1.32.0     


Then:

helm status cluster-autoscaler -n kube-system


The command helm list -A (or its alias helm ls -A) is used to list all Helm releases across every namespace in a Kubernetes cluster. Helm identifies your cluster and authenticates through the same mechanism as kubectl: the kubeconfig file. It uses the standard Kubernetes configuration file, typically located at ~/.kube/config, to determine which cluster to target.


(7) Double-check it’s not replaced by Karpenter

Many newer EKS clusters don’t use Cluster Autoscaler anymore.

% kubectl get pods -A | grep -i karpenter

kube-system karpenter-6f67b8c97b-lbq8p 1/1 Running     0       206d
kube-system karpenter-6f67b8c97b-wmprj 1/1 Running     0       206d


If Karpenter is installed, Cluster Autoscaler usually isn’t (or shouldn’t be).

Quick decision table

-----------------------------------------------------------------
Symptom                         Meaning
-----------------------------------------------------------------
No CA pod                         Not installed
Pod running, replicas=0         Installed but disabled
Logs show AccessDenied Broken IAM
Pods Pending, no scale-up ASG tags / config issue
Karpenter present                 CA likely not used
-----------------------------------------------------------------


Installation and Setup:


To use the Cluster Autoscaler in the EKS cluster we need to deploy it using a Helm chart or a pre-configured YAML manifest.

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml


In Terraform:

resource "helm_release" "cluster_autoscaler" {
  name = "cluster-autoscaler"

  repository = "https://kubernetes.github.io/autoscaler"
  chart      = "cluster-autoscaler"
  ...
}

Configuration:

  • Ensure the --nodes flag in the deployment specifies the min and max nodes for your node group.
  • Annotate your node groups with the k8s.io/cluster-autoscaler tags to enable autoscaler to manage them.

How to know if node was provisioned by Cluster Autoscaler?


Cluster Autoscaler applies labels on nodes it provisions so let's check labels:

% kubectl get nodes --show-labels

If label like eks.amazonaws.com/nodegroup exists, node was launched by and belongs to EKS Managed Node Group as Cluster Autoscaler launched the node.

Example:

% kubectl get nodes --show-labels

NAME                                     STATUS ROLES  AGE  VERSION            
ip-10-2-1-244.us-east-1.compute.internal Ready  <none> 206d v1.32.3-eks-473151a 

LABELS
Environment=prod,
beta.kubernetes.io/arch=amd64,
beta.kubernetes.io/instance-type=m5.xlarge,
beta.kubernetes.io/os=linux,
eks.amazonaws.com/capacityType=ON_DEMAND,
eks.amazonaws.com/nodegroup-image=ami-07fa6c030f5802c74,
eks.amazonaws.com/nodegroup=mycorp-prod-mycluster-20260714151819635800000002,
eks.amazonaws.com/sourceLaunchTemplateId=lt-0edc7a2b08ea82a28,
eks.amazonaws.com/sourceLaunchTemplateVersion=1,
failure-domain.beta.kubernetes.io/region=us-east-1,
failure-domain.beta.kubernetes.io/zone=us-east-1a,
mycorp;/node-type=default,
k8s.io/cloud-provider-aws=12b0e11196b7091c737cf66015f19720,
kubernetes.io/arch=amd64,
kubernetes.io/hostname=ip-10-2-1-244.us-east-1.compute.internal,
kubernetes.io/os=linux,
node.kubernetes.io/instance-type=m5.xlarge,
topology.ebs.csi.aws.com/zone=us-east-1a,
topology.k8s.aws/zone-id=use1-az1,
topology.kubernetes.io/region=us-east-1,
topology.kubernetes.io/zone=us-east-1a


If we list all nodegroups in the cluster, the one above is listed:

% aws eks list-nodegroups --cluster-name mycorp-prod-mycluster --profile my-profile
{
    "nodegroups": [
        "mycorp-prod-mycluster-20260714151819635800000002"
    ]
}



If cluster is overprovisioned, why Cluster Autoscaler doesn't scale nodes down automatically?


If Cluster Autoscaler is running but not shrinking the cluster, it's usually because:
  • System Pods: Pods like kube-dns or metrics-server don't have PDBs (Pod Disruption Budgets) and CA is afraid to move them.

  • Local Storage: A pod is using emptyDir or local storage.

  • Annotation: A pod has the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation.

  • Manual Overrides: Check if someone manually updated the Auto Scaling Group (ASG) or the EKS Managed Node Group settings in the AWS Console. Terraform won't automatically "downgrade" those nodes until the next terraform apply or a node recycle.

  • If nodes are very old, they are "frozen" in time. Even if you changed your Terraform to smaller EC2 instances recently, EKS Managed Node Groups do not automatically replace existing nodes just because the configuration changed. They wait for a triggered update or a manual recycling of the nodes.


How to fix this overprovisioning?


Since your current Terraform state says you want e.g. 2 nodes of m5.large, but the reality is e.g. 4 nodes of m5.xlarge, you need to force a sync.

Step 1: Check for Drift

Run a terraform plan. It will likely show that it wants to update the Launch Template or the Node Group version to switch from xlarge back to large.

Step 2: Trigger a Rolling Update

If you apply the Terraform and nothing happens to the existing nodes, you need to tell EKS to recycle them. You can do this via the AWS CLI:

aws eks update-nodegroup-version \
    --cluster-name <your-cluster-name> \
    --nodegroup-name <your-nodegroup-name> \
    --force

Note: This will gracefully terminate nodes one by one and replace them with the new m5.large type defined in your TF.



Cluster Autoscaler VS Karpenter


While both tools scale Kubernetes nodes to meet pod demand, they use fundamentally different approaches. Cluster Autoscaler (CA) is the traditional, "group-based" tool that adds nodes to existing pools, whereas Karpenter is a "provisioning" tool that directly creates the specific instances your applications need. 


Quick Feature Comparison Table


Scaling Logic
  • Cluster Autoscaler (CA): Scales pre-defined node groups (ASGs)
  • Karpenter: Directly provisions individual EC2 instances.

Speed
  • Cluster Autoscaler (CA): Slower; waits for cloud provider group updates
  • Karpenter: Faster; provisions nodes in seconds via direct APIs.

Cost Control
  • Cluster Autoscaler (CA): Limited; uses fixed node sizes in groups.
  • Karpenter: High; picks the cheapest/optimal instance for the pod.

Complexity
  • Cluster Autoscaler (CA): Higher; must manage multiple node groups.
  • Karpenter: Lower; one provisioner can handle many pod types.

Key Differences


Infrastructure Model:
  • CA asks, "How many more of these pre-configured nodes do I need?". 
  • Karpenter asks, "What specific resources (CPU, RAM, GPU) does this pending pod need right now?" and builds a node to match.

Node Groups: 
  • CA requires you to manually define and maintain Auto Scaling Groups (ASGs) for different instance types or zones. 
  • Karpenter bypasses ASGs entirely, allowing it to "mix and match" instance types dynamically in a single cluster.

Consolidation: 
  • Karpenter actively monitors the cluster to see if it can move pods to fewer or cheaper nodes to save money (bin-packing). 
  • While CA has a "scale-down" feature, it is less aggressive at optimizing for cost.

Spot Instance Management: 
  • Karpenter handles Spot interruptions and price changes more natively, selecting the most stable and cost-efficient Spot instances in real-time.

Which should you choose?


Use Cluster Autoscaler if you need a stable, battle-tested solution that works across multiple cloud providers (GCP, Azure) or if your workloads are very predictable and don't require rapid scaling.

Use Karpenter if you are on AWS EKS, need to scale up hundreds of nodes quickly, want to heavily use Spot instances, or want to reduce the operational burden of managing dozens of node groups.

Disable Cluster Autoscaler if you plan to use Karpenter. Having both leads to race conditions and wasted cost.

When to Run Both Together

It's generally not recommended to run Cluster Autoscaler and Karpenter together in the same cluster. However, there are specific scenarios where it might be acceptable:

Valid use cases for running both:

  • Migration period: Transitioning from Cluster Autoscaler to Karpenter, where you temporarily run both while gradually moving workloads
  • Hybrid node management: Managing distinct, non-overlapping node groups where Cluster Autoscaler handles some node groups and Karpenter handles others (though this adds complexity)

When It's Not Recommended (and Why)

Primary reasons to avoid running both:

Conflicting decisions: Both tools make independent scaling decisions, which can lead to:

  • Race conditions where both try to provision nodes simultaneously
  • Inefficient resource allocation
  • Unpredictable scaling behavior
  • One tool removing nodes the other just provisioned

Increased operational complexity:

  • Two systems to monitor, troubleshoot, and maintain
  • Doubled configuration overhead
  • More difficult to understand which tool made which scaling decision

Resource contention: Both tools consume cluster resources and API server capacity, adding unnecessary load.

No significant benefits: Karpenter can handle everything Cluster Autoscaler does, often more efficiently, so there's rarely a technical need for both.

EKS-Specific Considerations

Tthe same principles apply to AWS EKS clusters, with some additional context:

EKS particularities:

  • Karpenter was designed specifically for AWS/EKS and integrates deeply with EC2 APIs
  • Karpenter typically provides better performance on EKS (faster provisioning, better bin-packing)
  • If you're on EKS, the general recommendation is to choose Karpenter over Cluster Autoscaler for new deployments

Migration best practice for EKS: If migrating from Cluster Autoscaler to Karpenter on EKS, ensure they manage completely separate node groups, and complete the migration as quickly as feasible to minimize the period of running both.


How to migrate pods from nodes deployed by Cluster Autoscaler to those deployed by Karpenter?


If you'd rather use Karpenter for everything, you should eventually set your min_size, max_size, and desired_size to 0 in this node group and let Karpenter handle the provisioning instead.