Pages

Friday, 6 February 2026

Amazon EKS Autoscaling with Karpenter



Kubernetes autoscaling is a function that scales resources in and out depending on the current workload. AWS supports two autoscaling implementations:
  • Cluster Autoscaler
  • Karpenter 
    • Karpenter
    • flexible, high-performance Kubernetes cluster autoscaler
    • helps improve application availability and cluster efficiency
    • launches right-sized compute resources (for example, Amazon EC2 instances) in response to changing application load in under a minute
    • can provision just-in-time compute resources that precisely meet the requirements of your workload
    • automatically provisions new compute resources based on the specific requirements of cluster workloads. These include compute, storage, acceleration, and scheduling requirements. 
    • creates Kubernetes nodes directly from EC2 instances
    • improves the efficiency and cost of running workloads on the cluster
    • open-source


Pod Scheduler


  • Kubernetes cluster component responsible for determining which node Pods get assigned to
  • default Pod scheduler for Kubernetes is kube-scheduler
    • logs the reasons Pods can't be scheduled

Unschedulable Pods



A Pod is unschedulable when it's been put into Kubernetes' scheduling queue, but can't be deployed to a node. This can be for a number of reasons, including:
  • The cluster not having enough CPU or RAM available to meet the Pod's requirements.
  • Pod affinity or anti-affinity rules preventing it from being deployed to available nodes.
  • Nodes being cordoned due to updates or restarts.
  • The Pod requiring a persistent volume that's unavailable, or bound to an unavailable node.

How to detect unschedulable Pods?

Pods waiting to be scheduled are held in the "Pending" status, but if the Pod can't be scheduled, it will remain in this state. However, Pods that are being deployed normally are also marked as "Pending." The difference comes down to how long a Pod remains in "Pending." 

How to  fix unschedulable Pods? 
There is no single solution for unschedulable Pods as they have many different causes. However, there are a few things you can try depending on the cause. 
  • Enable cluster autoscaling
    • If you're using a managed Kubernetes service like Amazon EKS or Google Kubernetes Engine (GKE), you can very easily take advantage of autoscaling to increase and decrease cluster capacity on-demand. With autoscaling enabled, Kubernetes' Cluster Autoscaler will trigger your provider to add nodes when needed. As long as you've configured your cluster node pool and it hasn't reached its max node limit, your provider will automatically provision a new node and add it to the pool, making it available to the cluster and to your Pods.
  • Increase your node capacity
  • Check your Pod requests
  • Check your affinity and anti-affinity rules 

 

In this article we'll show how to enable cluster autoscaling with Karpenter.


How does the regular Kubernetes Autoscaler work in AWS?


When we create a regular Kubernetes cluster in AWS, each node group is managed by the AWS Auto-scaling group [Auto Scaling groups - Amazon EC2 Auto Scaling]. Cluster native autoscaler adjusts the desired size based on the load in the cluster to fit all unscheduled pods.

HorizontalPodAutoscaler (HPA) [Horizontal Pod Autoscaling | Kubernetes] is built into Kubernetes and it uses metrics like CPU usage, memory usage or custom metrics we can write to decide when to spin up or down additional pods in the node of the cluster. If our app is receiving more traffic, HPA will kick in and provision additional pods. 

VerticalPodAutoscaler (VPA) can also be installed in cluster where it manages the resource (like CPU and memory) allocation to pods that are already running.

What about when there's not enough capacity to schedule any more pods in the node? That's when we'll need an additional node. So we have a pod that needs to be scheduled but we don't know where to put it. We could call AWS API, spin up an additional EC2 node, get added it to our cluster or if we're using managed groups we can use Managed Node Group API, bump up the desired size but easier approach is to use cluster auto-scaler. There is a mature open-source solution called Cluster Auto-Scaler (CAS).

CAS was built to handle hundreds of different comninations of nodes types, zones, purchase options available in AWS. CAS works directly with managed node groups or self-managed managed nodes and auto-scaling groups which are AWS constructs to help us manage nodes. 


What are the issues with the regular Kubernetes Autoscaler?


Let's say CAS is installed on node, in cluster and manages one managed node group (MNG). It's filling up and we have an additional pod that needs to be provisioned so CAS tells MNG to bump up the number of nodes so it spins up another one so pod can now be scheduled. But this is not ideal. We have a single pod in a node, we don't need such a big node. 

This can be solved by creating a different MNG with a smaller instance type and now CAS recognizes that instance and provisions pod on a more appropriately-sized node.

Unfortunately, we might end up with many MNGs, based on requirements which might be a challenge to manage especially when looking best practices in terms of cost efficiency and high availability. 


How does Karpenter work?


Karpenter works differently, It doesn't use MNG or ASGs and manages each node directly. Let's say we have different pods, of different sizes. Let's say that HPA says that we need more of the smaller pods. Karpenter will intelligently pick the right instance type for that workload. If we need to spin up a larger pod it will again pick the right instance type. 

Karpenter picks exactly the right type of node for our workload. 

If we're using spot instances and spot capacity is not available, Karpenter does retries more quickly. Karpenter offers, faster, dynamic, more intelligent compute, using best practices without operational overhead of managing nodes ourselves. 

How to control how Karpenter operates?
There are many dimensions here. We can set constraints on Karpenter to limit the instances type, we can set up taints to isolate workloads to specific types of nodes. Different teams can have isolated access to different pods, one team can access billing pods, another GPU-based instances. 

Workload Consolidation feature: Pods are consolidated into fewer nodes.. let's say we have 3 nodes, two at 70% and one at 20% utilization. Karpenter detects this and will move pods from underutilized node to those two and shut down this now empty node (instances are terminated). This leads to lower costs.

Karpenter is making it easier to use spot and graviton instances which can also lead to lower costs. 

A feature to keep our nodes up to date. ttlSecondsUntilExpired parameter tells Karpenter to terminate nodes after a set amount of time. These nodes will automatically be replaced with new nodes, running the latest AMIs.

Karpenter:
1) lower costs
2) higher application availability 
3) lower operation overhead


Karpenter needs permissions to create EC2 instances in AWS. 

If we use a self-hosted (on bare metal boxes or EC2 instances), self-managed (you have full control over all aspects of Kubernetes) Kubernetes cluster, for example by using kOps (see also Is k8s Kops preferable than eks? : r/kubernetes), we can add additional IAM policies to the existing IAM role attached to Kubernetes nodes. 

If using EKS, the best way to grant access to internal service is with IAM roles for service accounts (IRSA).


How to configure Karpenter?


We can configure specific Karpenter NodePools or Provisioners.


How to know if node was provisioned by Karpenter?


Karpenter applies labels on nodes it provisions so let's check labels:

% kubectl get nodes --show-labels

If labels like karpenter.sh/nodepool or karpenter.sh/provisioner-name exist, Karpenter launched the node.

References:



No comments:

Post a Comment