Friday, 6 February 2026

Kubernetes Cluster Autoscaler


Kubernetes Cluster Autoscaler (CAS):
  • Designed to automatically adjust the number of nodes (EC2 instances) in our cluster based on the resource requests of the workloads running in the cluster
  • Kubernetes project, supported on EKS: https://github.com/kubernetes/autoscaler

Key Features:

  • Node Scaling: It adds or removes nodes based on the pending pods that cannot be scheduled due to insufficient resources.
  • Pod Scheduling: Ensures that all pending pods are scheduled by scaling the cluster up.

It works with EKS Managed Node Groups backed by AWS Auto Scaling Groups. In node group, if we provide specific settings (like custom block_device_mappings), EKS creates an EC2 Launch Template under the hood.


Cluster Autoscaler and kube-scheduler


kube-scheduler is the default control plane component in Kubernetes responsible for deciding which Node a newly created or unscheduled Pod should run on. It essentially matches pods to the most suitable available machines based on resource requirements and specific constraints.

Cluster Autoscaler and kube-scheduler components DO NOT directly communicate with the other. Instead, the Cluster Autoscaler (CA) watches the kube-scheduler by monitoring the state of pods in the cluster. 

They work in an indirect loop via the Kubernetes API server: 
  • Kube-scheduler: Attempts to place pods on existing nodes. If it cannot find a node with sufficient capacity, it marks the pod as Pending with an Unschedulable status.
  • Cluster Autoscaler: Monitors the cluster for these Unschedulable pods.
  • Action: When CA detects a pending pod, it triggers a scale-up by adding a node.
  • Completion: Once the new node joins, the kube-scheduler notices the new capacity and schedules the pending pod. 

Key Takeaways:
  • The Autoscaler watches the Scheduler: The autoscaler reacts to the decisions (or failed attempts) of the scheduler.
  • No Direct Connection: They are "blissfully unaware" of each other and interact only through Kubernetes API objects.
  • Not Resource Based: The Cluster Autoscaler does not directly monitor node CPU/memory usage; it only cares if the scheduler cannot place a pod

This indirect workflow ensures that new nodes are only provisioned when necessary to satisfy pod scheduling constraints. 



How to check if it's installed and enabled?


(1) Look for its deployment


Cluster Autoscaler usually runs as a Deployment in kube-system namespace so we can look for that deployment:

% kubectl get deployments -n kube-system | grep -i cluster-autoscaler

cluster-autoscaler-aws-cluster-autoscaler   2/2     2            2           296d

We can also list pods directly:

% kubectl get pods -n kube-system | grep -i cluster-autoscaler

cluster-autoscaler-aws-cluster-autoscaler-7cbb844455-q2lxv 1/1 Running 0 206d
cluster-autoscaler-aws-cluster-autoscaler-7cbb844455-vhbsw 1/1 Running 0 206d

If we see a pod running, it’s installed.

Typical names:
  • cluster-autoscaler
  • cluster-autoscaler-aws-clustername
  • cluster-autoscaler-eks-...

(2) Inspect the Deployment 


Confirm it’s enabled & configured.

% kubectl describe deployment cluster-autoscaler -n kube-system

Name:                   cluster-autoscaler-aws-cluster-autoscaler
Namespace:              kube-system
CreationTimestamp:      Wed, 16 Apr 2025 12:25:38 +0100
Labels:                 app.kubernetes.io/instance=cluster-autoscaler
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=aws-cluster-autoscaler
                        helm.sh/chart=cluster-autoscaler-9.46.6
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: cluster-autoscaler
                        meta.helm.sh/release-namespace: kube-system
Selector:               app.kubernetes.io/instance=cluster-autoscaler,app.kubernetes.io/name=aws-cluster-autoscaler
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/instance=cluster-autoscaler
                    app.kubernetes.io/name=aws-cluster-autoscaler
  Service Account:  cluster-autoscaler-aws-cluster-autoscaler
  Containers:
   aws-cluster-autoscaler:
    Image:      registry.k8s.io/autoscaling/cluster-autoscaler:v1.32.0
    Port:       8085/TCP
    Host Port:  0/TCP
    Command:
      ./cluster-autoscaler
      --cloud-provider=aws
      --namespace=kube-system
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/mycorp-prod-mycluster
      --logtostderr=true
      --stderrthreshold=info
      --v=4
    Liveness:  http-get http://:8085/health-check delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:     (v1:metadata.namespace)
      SERVICE_ACCOUNT:   (v1:spec.serviceAccountName)
      AWS_REGION:       us-east-1
    Mounts:             <none>
  Volumes:              <none>
  Priority Class Name:  system-cluster-critical
  Node-Selectors:       <none>
  Tolerations:          <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cluster-autoscaler-aws-cluster-autoscaler-7cbb844455 (2/2 replicas created)
Events:          <none>


Key things to look for:
  • Replicas ≥ 1
  • No crash loops
  • Command args like:
    • --cloud-provider=aws
    • --nodes=1:10:nodegroup-name
    • --balance-similar-node-groups

If replicas are 0, it’s installed but effectively disabled.

(3) Check logs


Is it actively scaling?

This confirms it’s working, not just running.

kubectl logs deployment/cluster-autoscaler -n kube-system 

or find pods:

kubectl get pods \
    -l app.kubernetes.io/name=cluster-autoscaler \
    -n kube-system

Then check logs:

kubectl logs \
    -l app.kubernetes.io/name=cluster-autoscaler \
    -n kube-system \
    | grep "Standard-Autoscaler" 

Healthy / active signs:
  • scale up
  • scale down
  • Unschedulable pods
  • Node group ... increase size
  • If you see messages like Refresher: resolving ASGs, it will list the names of the ASGs it is currently monitoring.

Red flags:
  • AccessDenied
  • no node groups found
  • failed to get ASG

(4) Check for unschedulable pods trigger


If CA is working, it reacts to pods stuck in Pending.

% kubectl get pods -A | grep Pending

If pods are pending and CA logs mention them → CA is enabled and reacting.

(5) AWS EKS-specific checks (very common)


a) Check IAM permissions (classic failure mode)

Cluster Autoscaler must run with an IAM role that can talk to ASGs.

% kubectl -n kube-system get sa | grep autoscaler

cluster-autoscaler-aws-cluster-autoscaler     0         296d
horizontal-pod-autoscaler                     0         296d

Let's inspect cluster-autoscaler-aws-cluster-autoscaler service accont:

% kubectl -n kube-system get sa cluster-autoscaler-aws-cluster-autoscaler  -o yaml

apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxxxx:role/mycorp-prod-mycluster-cluster-autoscaler
    meta.helm.sh/release-name: cluster-autoscaler
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2026-04-16T11:25:37Z"
  labels:
    app.kubernetes.io/instance: cluster-autoscaler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: aws-cluster-autoscaler
    helm.sh/chart: cluster-autoscaler-9.46.6
  name: cluster-autoscaler-aws-cluster-autoscaler
  namespace: kube-system
  resourceVersion: "15768"
  uid: 0a7da521-1bf5-5a5f-a155-8801e876ea7b


Look for:

eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/ClusterAutoscalerRole

If missing → CA may exist but cannot scale.

b) Check Auto Scaling Group tags

Our node group ASGs must be tagged:

k8s.io/cluster-autoscaler/enabled = true
k8s.io/cluster-autoscaler/<cluster-name> = owned

Without these → CA runs but does nothing. If those tags are missing, Cluster Autoscaler will ignore that ASG entirely.

(6) Check Helm (if installed via Helm)

Let's list all Helm releases across every namespace in a Kubernetes cluster and look for cluster autoscaler:

% helm list -A
NAME                  NAMESPACE      REVISION UPDATED      
cluster-autoscaler    kube-system    1           2025-04-16 12:25:30.389073326 +0100BST

STATUS   CHART                                APP VERSION
deployed cluster-autoscaler-9.46.6          1.32.0     


Then:

helm status cluster-autoscaler -n kube-system

The command helm list -A (or its alias helm ls -A) is used to list all Helm releases across every namespace in a Kubernetes cluster. Helm identifies your cluster and authenticates through the same mechanism as kubectl: the kubeconfig file. It uses the standard Kubernetes configuration file, typically located at ~/.kube/config, to determine which cluster to target.


(7) Double-check it’s not replaced by Karpenter


Many newer EKS clusters don’t use Cluster Autoscaler anymore.

% kubectl get pods -A | grep -i karpenter

kube-system karpenter-6f67b8c97b-lbq8p 1/1 Running     0       206d
kube-system karpenter-6f67b8c97b-wmprj 1/1 Running     0       206d


If Karpenter is installed, Cluster Autoscaler usually isn’t (or shouldn’t be).

Quick decision table

-----------------------------------------------------------------
Symptom                         Meaning
-----------------------------------------------------------------
No CA pod                         Not installed
Pod running, replicas=0         Installed but disabled
Logs show AccessDenied Broken IAM
Pods Pending, no scale-up ASG tags / config issue
Karpenter present                 CA likely not used
-----------------------------------------------------------------


(8) Check the "Status" ConfigMap


Cluster Autoscaler maintains a ConfigMap that shows which groups it is managing and if they are at their max/min size:


% kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

apiVersion: v1
data:
  status: |
    time: 2026-03-21 08:52:07.308206626 +0000 UTC
    autoscalerStatus: Running
    clusterWide:
      health:
        status: Healthy
        nodeCounts:
          registered:
            total: 6
            ready: 6
            notStarted: 0
          longUnregistered: 0
          unregistered: 0
        lastProbeTime: "2026-03-21T08:52:07.308206626Z"
        lastTransitionTime: "2026-03-20T16:30:07.460032826Z"
      scaleUp:
        status: NoActivity
        lastProbeTime: "2026-03-21T08:52:07.308206626Z"
        lastTransitionTime: "2026-03-20T16:30:07.460032826Z"
      scaleDown:
        status: NoCandidates
        lastProbeTime: "2026-03-21T08:52:07.308206626Z"
        lastTransitionTime: "2026-03-20T16:30:07.460032826Z"
    nodeGroups:
    - name: eks-mycorp-env-app-k8s-v1_33-202603...04-2cc...dc8
      health:
        status: Healthy
        nodeCounts:
          registered:
            total: 2
            ready: 2
            notStarted: 0
          longUnregistered: 0
          unregistered: 0
        cloudProviderTarget: 2
        minSize: 2
        maxSize: 10
        lastProbeTime: "2026-03-21T08:52:07.308206626Z"
        lastTransitionTime: "2026-03-20T16:30:07.460032826Z"
      scaleUp:
        status: NoActivity
        lastProbeTime: "2026-03-21T08:52:07.308206626Z"
        lastTransitionTime: "2026-03-20T16:30:07.460032826Z"
      scaleDown:
        status: NoCandidates
        lastProbeTime: "2026-03-21T08:52:07.308206626Z"
        lastTransitionTime: "2026-03-20T16:30:07.460032826Z"
kind: ConfigMap
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/last-updated: 2026-03-21 08:52:07.308206626 +0000
      UTC
  creationTimestamp: "2026-03-20T16:29:56Z"
  name: cluster-autoscaler-status
  namespace: kube-system
  resourceVersion: "18...78"
  uid: 17b...0af

Installation and Setup:


To use the Cluster Autoscaler in the EKS cluster we need to deploy it using a Helm chart or a pre-configured YAML manifest.

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml


In Terraform:

resource "helm_release" "cluster_autoscaler" {
  name = "cluster-autoscaler"

  repository = "https://kubernetes.github.io/autoscaler"
  chart      = "cluster-autoscaler"
  version    = "9.46.6"
  namespace  = "kube-system"

  set {
    name  = "autoDiscovery.clusterName"
    value = local.cluster_name
  }

  set {
    name  = "awsRegion"
    value = local.aws_region
  }

  set {
    name  = "rbac.serviceAccount.create"
    value = "false"
  }

  set {
    name  = "rbac.serviceAccount.name"
    value = local.service_account_name
  }
}

Configuration:

  • Ensure the --nodes flag in the deployment specifies the min and max nodes for your node group.
  • Annotate your node groups with the k8s.io/cluster-autoscaler tags to enable autoscaler to manage them.

How to know if node was provisioned by Cluster Autoscaler?


Cluster Autoscaler applies labels on nodes it provisions so let's check labels:

% kubectl get nodes --show-labels

If label like eks.amazonaws.com/nodegroup exists, node was launched by and belongs to EKS Managed Node Group as Cluster Autoscaler launched the node.

Example:

% kubectl get nodes --show-labels

NAME                                     STATUS ROLES  AGE  VERSION            
ip-10-2-1-244.us-east-1.compute.internal Ready  <none> 206d v1.32.3-eks-473151a 

LABELS
Environment=prod,
beta.kubernetes.io/arch=amd64,
beta.kubernetes.io/instance-type=m5.xlarge,
beta.kubernetes.io/os=linux,
eks.amazonaws.com/capacityType=ON_DEMAND,
eks.amazonaws.com/nodegroup-image=ami-07fa6c030f5802c74,
eks.amazonaws.com/nodegroup=mycorp-prod-mycluster-20260714151819635800000002,
eks.amazonaws.com/sourceLaunchTemplateId=lt-0edc7a2b08ea82a28,
eks.amazonaws.com/sourceLaunchTemplateVersion=1,
failure-domain.beta.kubernetes.io/region=us-east-1,
failure-domain.beta.kubernetes.io/zone=us-east-1a,
mycorp;/node-type=default,
k8s.io/cloud-provider-aws=12b0e11196b7091c737cf66015f19720,
kubernetes.io/arch=amd64,
kubernetes.io/hostname=ip-10-2-1-244.us-east-1.compute.internal,
kubernetes.io/os=linux,
node.kubernetes.io/instance-type=m5.xlarge,
topology.ebs.csi.aws.com/zone=us-east-1a,
topology.k8s.aws/zone-id=use1-az1,
topology.kubernetes.io/region=us-east-1,
topology.kubernetes.io/zone=us-east-1a


If we list all nodegroups in the cluster, the one above is listed:

% aws eks list-nodegroups \
    --cluster-name mycorp-env-app-k8s \
    --profile my_profile
{
    "nodegroups": [
        "mycorp-env-app-k8s-20260714151819635800000002"
    ]
}

To inspect the nodegroup, including its labels, use:

% aws eks describe-nodegroup \
    --cluster-name mycorp-env-app-k8s \
    --nodegroup-name mycorp-env-app-k8s-v1_33-20260...03 \
    --region us-east-2 \
    --profile my_profile \
    --output json
{
    "nodegroup": {
        "nodegroupName": "mycorp-env-app-k8s-v1_33-20260...003",
        "nodegroupArn": "arn:aws:eks:us-east-2:xxxx:nodegroup/mycorp-env-app-k8s/mycorp-env-app-k8s-v1_33-202....03/2cce8....c7",
        "clusterName": "mycorp-env-app-k8s",
        "version": "1.33",
        "releaseVersion": "1.33.8-20260317",
        "createdAt": "2026-03-20T12:41:07.961000+00:00",
        "modifiedAt": "2026-03-22T09:21:58.892000+00:00",
        "status": "ACTIVE",
        "capacityType": "ON_DEMAND",
        "scalingConfig": {
            "minSize": 2,
            "maxSize": 10,
            "desiredSize": 2
        },
        "instanceTypes": [
            "m5.large"
        ],
        "subnets": [
            "subnet-02xxx",
            "subnet-00xxx",
            "subnet-04xxx"
        ],
        "amiType": "AL2023_x86_64_STANDARD",
        "nodeRole": "arn:aws:iam::xxx:role/mycorp-env-app-k8s-v1_33-eks-node-group",
        "labels": {
            "Environment": "prod",
            "mycorp/node-type": "v1.33"
        },
        "resources": {
            "autoScalingGroups": [
                {
                    "name": "mycorp-env-app-k8s-v1_33-202603...dc7"
                }
            ]
        },
        "health": {
            "issues": []
        },
        "updateConfig": {
            "maxUnavailablePercentage": 33
        },
        "launchTemplate": {
            "name": "mycorp-env-app-k8s-v1_33-202...0001",
            "version": "1",
            "id": "lt-xxx"
        },
        "tags": {
            "ClusterName": "mycorp-env-app-k8s",
            "Environment": "prod",
            "terraform-aws-modules": "eks",
            "Terraform": "true",
            "Name": "mycorp-env-app-k8s-v1_33"
        }
    }
}


If we are using terraform-aws-modules/eks/aws to provision EKS cluster and within it we define EKS-managed node groups, this module will create AWS Autoscaling Group (ASG) for each of them. Their names can be read from module's output variable eks_managed_node_groups_autoscaling_group_names.

If we inspect the labels on one such ASG, we can see that this Terraform module attached labels on it, so CAS can discover it and manage its parameters (usually just desired_size which is used for increasing or decreasing the number of current EC2 instances):

% aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names eks-mycorp-env-app-k8s-v1_33-20260320...4-2c...8 \
  --query "AutoScalingGroups[].Tags[?starts_with(Key, 'k8s.io/cluster-autoscaler')]" \
  --region us-east-2 \
  --profile my_profile
[
    [
        {
            "ResourceId": "eks-mycorp-env-app-k8s-v1_33-20260320...4-2c...8",
            "ResourceType": "auto-scaling-group",
            "Key": "k8s.io/cluster-autoscaler/enabled",
            "Value": "true",
            "PropagateAtLaunch": true
        },
        {
            "ResourceId": "eks-mycorp-env-app-k8s-v1_33-20260320...4-2c...8",
            "ResourceType": "auto-scaling-group",
            "Key": "k8s.io/cluster-autoscaler/mycorp-env-app-k8s",
            "Value": "owned",
            "PropagateAtLaunch": true
        }
    ]
]

In the example above, mycorp-env-app-k8s is the name of the cluster.



If cluster is overprovisioned, why Cluster Autoscaler doesn't scale nodes down automatically?


If Cluster Autoscaler is running but not shrinking the cluster, it's usually because:
  • System Pods: Pods like kube-dns or metrics-server don't have PDBs (Pod Disruption Budgets) and CA is afraid to move them.

  • Local Storage: A pod is using emptyDir or local storage.

  • Annotation: A pod has the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation.

  • Manual Overrides: Check if someone manually updated the Auto Scaling Group (ASG) or the EKS Managed Node Group settings in the AWS Console. Terraform won't automatically "downgrade" those nodes until the next terraform apply or a node recycle.

  • If nodes are very old, they are "frozen" in time. Even if you changed your Terraform to smaller EC2 instances recently, EKS Managed Node Groups do not automatically replace existing nodes just because the configuration changed. They wait for a triggered update or a manual recycling of the nodes.


How to fix this overprovisioning?


Since your current Terraform state says you want e.g. 2 nodes of m5.large, but the reality is e.g. 4 nodes of m5.xlarge, you need to force a sync.

Step 1: Check for Drift

Run a terraform plan. It will likely show that it wants to update the Launch Template or the Node Group version to switch from xlarge back to large.

Step 2: Trigger a Rolling Update

If you apply the Terraform and nothing happens to the existing nodes, you need to tell EKS to recycle them. You can do this via the AWS CLI:

aws eks update-nodegroup-version \
    --cluster-name <your-cluster-name> \
    --nodegroup-name <your-nodegroup-name> \
    --force

Note: This will gracefully terminate nodes one by one and replace them with the new m5.large type defined in your TF.



Cluster Autoscaler VS Karpenter


CAS (Cluster Autoscaler) and Karpenter are Kubernetes tools for adjusting node capacity based on workload, with CAS relying on fixed node groups and slow, infrastructure-driven scaling. Karpenter is a faster, modern, open-source, workload-driven node provisioner that directly interacts with cloud APIs, improving efficiency and cost-optimization.

Cluster Autoscaler (CAS): Operates by adjusting the size of specific, pre-defined node groups (e.g., autoscaling groups). It is generally better suited for smaller, predictable, or steady-state workloads where strict node group management is preferred.

Karpenter: Evaluates pending pods and launches optimally sized nodes directly, bypassing the need for manual node group management. It is ideal for high-churn, highly dynamic, and cost-sensitive, large-scale production environments.

While both tools scale Kubernetes nodes to meet pod demand, they use fundamentally different approaches. Cluster Autoscaler (CA) is the traditional, "group-based" tool that adds nodes to existing pools, whereas Karpenter is a "provisioning" tool that directly creates the specific instances your applications need. 


Quick Feature Comparison Table


Scaling Logic
  • Cluster Autoscaler (CA): Scales pre-defined node groups (ASGs)
  • Karpenter: Directly provisions individual EC2 instances.

Speed
  • Cluster Autoscaler (CA): Slower; waits for cloud provider group updates
  • Karpenter: Faster; provisions nodes in seconds via direct APIs; better for rapid, "spiky" traffic.

Cost Control
  • Cluster Autoscaler (CA): Limited; uses fixed node sizes in groups.
  • Karpenter: High; picks the cheapest/optimal instance for the pod. It has built-in node consolidation, which intelligently reduces costs by binpacking, or packing, pods onto fewer, more efficient nodes.

Complexity
  • Cluster Autoscaler (CA): Higher; must manage multiple node groups.
  • Karpenter: Lower; one provisioner can handle many pod types. 

Flexibility
  • Karpenter: supports diverse instance types and, while commonly used with AWS, it can be used with other providers.

Configuration
  • Karpenter uses Kubernetes-native YAML for defining node pools and node classes.

Key Differences


Infrastructure Model:
  • CA asks, "How many more of these pre-configured nodes do I need?". 
  • Karpenter asks, "What specific resources (CPU, RAM, GPU) does this pending pod need right now?" and builds a node to match.

Node Groups: 
  • CA requires you to manually define and maintain Auto Scaling Groups (ASGs) for different instance types or zones. 
  • Karpenter bypasses ASGs entirely, allowing it to "mix and match" instance types dynamically in a single cluster.

Consolidation: 
  • Karpenter actively monitors the cluster to see if it can move pods to fewer or cheaper nodes to save money (bin-packing). 
  • While CA has a "scale-down" feature, it is less aggressive at optimizing for cost.

Spot Instance Management: 
  • Karpenter handles Spot interruptions and price changes more natively, selecting the most stable and cost-efficient Spot instances in real-time.

Which should you choose?


Use Cluster Autoscaler if you need a stable, battle-tested solution that works across multiple cloud providers (GCP, Azure) or if your workloads are very predictable and don't require rapid scaling.

Use Karpenter if you are on AWS EKS, need to scale up hundreds of nodes quickly, want to heavily use Spot instances, or want to reduce the operational burden of managing dozens of node groups.

Disable Cluster Autoscaler if you plan to use Karpenter. Having both leads to race conditions and wasted cost.

When to Run Both Together

It's generally not recommended to run Cluster Autoscaler and Karpenter together in the same cluster. However, there are specific scenarios where it might be acceptable:

Valid use cases for running both:
  • Migration period: Transitioning from Cluster Autoscaler to Karpenter, where you temporarily run both while gradually moving workloads
  • Hybrid node management: Managing distinct, non-overlapping node groups where Cluster Autoscaler handles some node groups and Karpenter handles others (though this adds complexity)

When It's Not Recommended (and Why)

Primary reasons to avoid running both:

Conflicting decisions: Both tools make independent scaling decisions, which can lead to:
  • Race conditions where both try to provision nodes simultaneously
  • Inefficient resource allocation
  • Unpredictable scaling behavior
  • One tool removing nodes the other just provisioned

Increased operational complexity:
  • Two systems to monitor, troubleshoot, and maintain
  • Doubled configuration overhead
  • More difficult to understand which tool made which scaling decision

Resource contention: Both tools consume cluster resources and API server capacity, adding unnecessary load.

No significant benefits: Karpenter can handle everything Cluster Autoscaler does, often more efficiently, so there's rarely a technical need for both.

EKS-Specific Considerations

The same principles apply to AWS EKS clusters, with some additional context:

EKS particularities:
  • Karpenter was designed specifically for AWS/EKS and integrates deeply with EC2 APIs
  • Karpenter typically provides better performance on EKS (faster provisioning, better bin-packing)
  • If you're on EKS, the general recommendation is to choose Karpenter over Cluster Autoscaler for new deployments

Migration best practice for EKS: If migrating from Cluster Autoscaler to Karpenter on EKS, ensure they manage completely separate node groups, and complete the migration as quickly as feasible to minimize the period of running both.


How to migrate pods from nodes deployed by Cluster Autoscaler to those deployed by Karpenter?


If you'd rather use Karpenter for everything, you should eventually set your min_size, max_size, and desired_size to 0 in this node group and let Karpenter handle the provisioning instead.

---

No comments: