My Public Notepad
Bits and bobs about computers and programming
Tuesday, 17 February 2026
How to use terraform-docs automatically generate Terraform code documentation
How to install Terraform on Mac
Friday, 6 February 2026
Amazon EKS Autoscaling with Karpenter
- Cluster Autoscaler
- autoscaler/cluster-autoscaler/cloudprovider/aws/README.md at master · kubernetes/autoscaler
- automatically adjusts the number of nodes in the cluster when pods fail or are rescheduled onto other nodes
- uses Auto Scaling groups
- Karpenter
- Karpenter
- flexible, high-performance Kubernetes cluster autoscaler
- helps improve application availability and cluster efficiency
- launches right-sized compute resources (for example, Amazon EC2 instances) in response to changing application load in under a minute
- can provision just-in-time compute resources that precisely meet the requirements of your workload
- automatically provisions new compute resources based on the specific requirements of cluster workloads. These include compute, storage, acceleration, and scheduling requirements.
- creates Kubernetes nodes directly from EC2 instances
- improves the efficiency and cost of running workloads on the cluster
- open-source
Pod Scheduler
- Kubernetes cluster component responsible for determining which node Pods get assigned to
- default Pod scheduler for Kubernetes is kube-scheduler
- logs the reasons Pods can't be scheduled
Unschedulable Pods
A Pod is unschedulable when it's been put into Kubernetes' scheduling queue, but can't be deployed to a node. This can be for a number of reasons, including:
- The cluster not having enough CPU or RAM available to meet the Pod's requirements.
- Pod affinity or anti-affinity rules preventing it from being deployed to available nodes.
- Nodes being cordoned due to updates or restarts.
- The Pod requiring a persistent volume that's unavailable, or bound to an unavailable node.
How to detect unschedulable Pods?
Pods waiting to be scheduled are held in the "Pending" status, but if the Pod can't be scheduled, it will remain in this state. However, Pods that are being deployed normally are also marked as "Pending." The difference comes down to how long a Pod remains in "Pending."
How to fix unschedulable Pods?
There is no single solution for unschedulable Pods as they have many different causes. However, there are a few things you can try depending on the cause.
- Enable cluster autoscaling
- If you're using a managed Kubernetes service like Amazon EKS or Google Kubernetes Engine (GKE), you can very easily take advantage of autoscaling to increase and decrease cluster capacity on-demand. With autoscaling enabled, Kubernetes' Cluster Autoscaler will trigger your provider to add nodes when needed. As long as you've configured your cluster node pool and it hasn't reached its max node limit, your provider will automatically provision a new node and add it to the pool, making it available to the cluster and to your Pods.
- Increase your node capacity
- Check your Pod requests
- Check your affinity and anti-affinity rules
In this article we'll show how to enable cluster autoscaling with Karpenter.
How does the regular Kubernetes Autoscaler work in AWS?
What are the issues with the regular Kubernetes Autoscaler?
How does Karpenter work?
How to configure Karpenter?
How to know if node was provisioned by Karpenter?
References:
Kubernetes Cluster Autoscaler
- Designed to automatically adjust the number of nodes (EC2 instances) in our cluster based on the resource requests of the workloads running in the cluster
- Kubernetes project, supported on EKS
Key Features:
- Node Scaling: It adds or removes nodes based on the pending pods that cannot be scheduled due to insufficient resources.
- Pod Scheduling: Ensures that all pending pods are scheduled by scaling the cluster up.
How to check if it's installed and enabled?
% kubectl get deployments -n kube-system | grep -i autoscaler
- cluster-autoscaler
- cluster-autoscaler-aws-clustername
- cluster-autoscaler-eks-...
- Replicas ≥ 1
- No crash loops
- Command args like:
- --cloud-provider=aws
- --nodes=1:10:nodegroup-name
- --balance-similar-node-groups
- scale up
- scale down
- Unschedulable pods
- Node group ... increase size
- AccessDenied
- no node groups found
- failed to get ASG
Installation and Setup:
In Terraform:
Configuration:
- Ensure the --nodes flag in the deployment specifies the min and max nodes for your node group.
- Annotate your node groups with the k8s.io/cluster-autoscaler tags to enable autoscaler to manage them.
How to know if node was provisioned by Cluster Autoscaler?
{
"nodegroups": [
"mycorp-prod-mycluster-20260714151819635800000002"
]
}
If cluster is overprovisioned, why Cluster Autoscaler doesn't scale nodes down automatically?
System Pods: Pods like
kube-dnsormetrics-serverdon't have PDBs (Pod Disruption Budgets) and CA is afraid to move them.Local Storage: A pod is using
emptyDiror local storage.Annotation: A pod has the
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"annotation.Manual Overrides: Check if someone manually updated the Auto Scaling Group (ASG) or the EKS Managed Node Group settings in the AWS Console. Terraform won't automatically "downgrade" those nodes until the next terraform apply or a node recycle.
If nodes are very old, they are "frozen" in time. Even if you changed your Terraform to smaller EC2 instances recently, EKS Managed Node Groups do not automatically replace existing nodes just because the configuration changed. They wait for a triggered update or a manual recycling of the nodes.
How to fix this overprovisioning?
Cluster Autoscaler VS Karpenter
Quick Feature Comparison Table
- Cluster Autoscaler (CA): Scales pre-defined node groups (ASGs)
- Karpenter: Directly provisions individual EC2 instances.
- Cluster Autoscaler (CA): Slower; waits for cloud provider group updates
- Karpenter: Faster; provisions nodes in seconds via direct APIs.
- Cluster Autoscaler (CA): Limited; uses fixed node sizes in groups.
- Karpenter: High; picks the cheapest/optimal instance for the pod.
- Cluster Autoscaler (CA): Higher; must manage multiple node groups.
- Karpenter: Lower; one provisioner can handle many pod types.
Key Differences
- CA asks, "How many more of these pre-configured nodes do I need?".
- Karpenter asks, "What specific resources (CPU, RAM, GPU) does this pending pod need right now?" and builds a node to match.
Node Groups:
- CA requires you to manually define and maintain Auto Scaling Groups (ASGs) for different instance types or zones.
- Karpenter bypasses ASGs entirely, allowing it to "mix and match" instance types dynamically in a single cluster.
Consolidation:
- Karpenter actively monitors the cluster to see if it can move pods to fewer or cheaper nodes to save money (bin-packing).
- While CA has a "scale-down" feature, it is less aggressive at optimizing for cost.
Spot Instance Management:
- Karpenter handles Spot interruptions and price changes more natively, selecting the most stable and cost-efficient Spot instances in real-time.
Which should you choose?
- Migration period: Transitioning from Cluster Autoscaler to Karpenter, where you temporarily run both while gradually moving workloads
- Hybrid node management: Managing distinct, non-overlapping node groups where Cluster Autoscaler handles some node groups and Karpenter handles others (though this adds complexity)
- Race conditions where both try to provision nodes simultaneously
- Inefficient resource allocation
- Unpredictable scaling behavior
- One tool removing nodes the other just provisioned
- Two systems to monitor, troubleshoot, and maintain
- Doubled configuration overhead
- More difficult to understand which tool made which scaling decision
- Karpenter was designed specifically for AWS/EKS and integrates deeply with EC2 APIs
- Karpenter typically provides better performance on EKS (faster provisioning, better bin-packing)
- If you're on EKS, the general recommendation is to choose Karpenter over Cluster Autoscaler for new deployments
How to migrate pods from nodes deployed by Cluster Autoscaler to those deployed by Karpenter?
Kubernetes Metrics Server
- Purpose: While HPA adds more pods, the Vertical Pod Autoscaler (VPA) adjusts the CPU and memory requests/limits of existing pods.
- Dependency: VPA relies on Metrics Server for the real-time resource data it uses to recommend or apply these resource changes.
- Purpose: Commands used for ad-hoc debugging and performance monitoring.
- Dependency: Both kubectl top pods and kubectl top nodes query the Metrics API directly. Without the server, these commands will return an error.
- Purpose: A web-based UI for managing and troubleshooting clusters.
- Dependency: The Kubernetes Dashboard uses Metrics Server to display resource usage graphs and live statistics for nodes and pods.
- Custom Metrics Adapters: Some adapters that bridge external sources (like CloudWatch or Datadog) to Kubernetes may use the standard Metrics API for fallback or basic resource data.
- Resource Management Tools: Operational tools such as Goldilocks, which suggests "just right" resource requests, often depend on the baseline metrics provided by this server.
Key Distinction
How to to install the Metrics Server as an EKS Community Add-on to enable these features?
Method 1: Using the AWS Management Console
- Navigate to your EKS cluster in the AWS Console.
- Select the Add-ons tab and click Get more add-ons.
- Scroll down to the Community add-ons section.
- Find Metrics Server, select it, and click Next.
- Choose the desired version (usually the latest recommended) and click Create.
Method 2: Using the AWS CLI
Verification
Metrics Server Configuration
1. Scaling Recommendations
- CPU: Approximately 1 millicore per node in the cluster.
- Memory: Approximately 2 MB of memory per node.
- Large Clusters: If your cluster exceeds 100 nodes, it is recommended to double these defaults and monitor performance.
2. How to Apply Custom Limits
Via AWS CLI
Via AWS Console
- Go to the Add-ons tab in your EKS cluster.
- Click Edit on the metrics-server add-on.
- Expand the Optional configuration settings.
- Paste the JSON configuration into the Configuration values text box.
3. Critical Configuration for High Traffic
- Metric Resolution: The default is 60s. For faster scaling, add --metric-resolution=15s to the container arguments via the same configuration block.
- High Availability: The community add-on defaults to 2 replicas to prevent downtime during scaling events.
Thursday, 5 February 2026
Horizontal Pod Autoscaler (HPA)
- standard Kubernetes autoscaling mechanism
- Available Out of the box (i.e., without installing third-party controllers)
- Web API: Might scale when CPU hits 70%.
- Background Worker: Might scale based on Memory usage.
- Data Processor: Might scale based on a Custom Metric like SQS queue depth.
Key Features:
- Pod Scaling: Adjusts the number of pod replicas to match the demand.
- Automatically scales up/down the number of pods in a deployment, replication controller, or replica set based on observed CPU utilization, memory or other selected custom/external metrics.
Purpose
- Dynamic Scalability: Automatically adds pods during traffic surges to maintain performance and removes them during low-traffic periods to reduce waste.
- Cost Optimisation: Ensures you only pay for the compute resources currently needed rather than over-provisioning for peak loads.
- Resilience & Availability: Prevents application crashes and outages by proactively scaling out before resources are fully exhausted.
- Operational Efficiency: Replaces manual intervention with "architectural definition," allowing infrastructure to manage itself based on predefined performance rules.
Dependencies
- Metrics Server (The Aggregator): This is the most critical infrastructure dependency. The HPA controller queries the Metrics API (typically provided by the Metrics Server) to get real-time CPU and memory usage data.
- Resource Requests (The Baseline): For the HPA to calculate percentage-based utilization (e.g., "scale at 50% CPU"), the target Deployment must have resources.requests defined. Without these, the HPA has no 100% baseline to measure against and will show an unknown status.
- Controller Manager: The HPA logic runs as a control loop within the Kubernetes kube-controller-manager, which periodically (every 15 seconds by default) evaluates the metrics and updates the desired replica count.
- Scalable Target: The HPA must be linked to a resource that supports scaling, such as a Deployment, ReplicaSet, or StatefulSet.
- Cluster Capacity (Node Scaling): While HPA scales pods, it depends on an underlying node scaler (like Karpenter or Cluster Autoscaler) to provide new EC2 instances if the cluster runs out of physical space for the additional pods.
Dependencies
- It requires Kubernetes Metrics Server
Installation and Setup:
Configuration:
How to check if it's enabled?
In which namespace do HorizontalPodAutoscalers reside in?
- Application Namespace: When you create an HPA, you define it within the specific namespace where your application is running (e.g., default, production, or demo).
- Constraint: An HPA can only scale a target resource (like a Deployment) that exists in that same namespace.
- Metrics Server: This is a mandatory prerequisite for HPA on EKS. It is typically deployed in the kube-system namespace.
- Custom Metrics Adapters: If you are scaling based on custom metrics (like Prometheus or CloudWatch), components like the prometheus-adapter or k8s-cloudwatch-adapter may be installed in kube-system or a dedicated namespace like custom-metrics.
- Cluster Autoscaler: Often confused with HPA, the Cluster Autoscaler (which scales EC2 nodes rather than pods) also typically resides in the kube-system namespace.
- HPA is not yet created: By default, EKS clusters do not come with any HorizontalPodAutoscalers pre-configured. You must explicitly create one for your application.
- Metrics Server Missing: HPAs rely on the Kubernetes Metrics Server to function. While the HPA object can be created without it, it will show a status of <unknown> and may not appear if you are looking for active scaling.
- Namespace Context: Even with -A (all namespaces), if no user or system service has defined an HPA resource, the list will be empty.
- Check if Metrics Server is running:
- Run kubectl get deployment metrics-server -n kube-system. If it’s missing, you can install it via the AWS EKS Add-ons in the console or via kubectl apply.
- Check API availability:
- Run kubectl api-resources | grep hpa to confirm the cluster recognizes the resource type.
- Create a test HPA:
- If you have a deployment named my-app, try creating one:
- Workloads are not yet auto-scaled: By default, EKS (and Kubernetes) does not apply HPAs to your deployments. You must explicitly create an HPA object that references your target Deployment or StatefulSet.
- kubectl get all exclusion: Standard kubectl get all does not include HPAs in its output, which is why your previous command didn't show them even if they existed.
- Namespace Location: While your metrics-server is in the default namespace (though typically it's in kube-system), HPAs must be created in the same namespace as the app they are scaling.
- List all HPAs: kubectl get hpa -A.
- Check Metrics Flow: Since your metrics-server is running, ensure it is actually collecting data by running kubectl top pods -A. If this returns usage data, your HPA will be able to scale correctly.
How to configure metrics it needs to observe?
- Target Utilization: Typically set as a percentage of a pod's requested CPU or memory.
- Where to define: Inside the metrics list in your HPA manifest.
- Stabilization Window: Prevents "flapping" by making the HPA wait and look at past recommendations before acting.
- Scale-Up: Default is 0 seconds (instant growth).
- Scale-Down: Default is 300 seconds (5 minutes) to ensure a spike is truly over before killing pods.
- Policies: Restrict the absolute number or percentage of pods changed within a specific timeframe.
- Resource Requests: You must define resources.requests in your Deployment manifest. HPA cannot calculate utilization percentages without this baseline.
- Metrics Server: Must be running in your cluster (usually in kube-system or default) to provide the data HPA needs.
HPA vs Karpenter
- HPA manages Pods (The Demand): It decides how many pods you need (e.g., "CPU is high, let's go from 2 pods to 5 pods").
- Karpenter manages Nodes (The Supply): It provides the underlying infrastructure for those pods. It watches for pods that are "Pending" because there is no room for them, then quickly spins up a new EC2 instance that fits them perfectly.
- Vertical Pod Autoscaler (VPA): Instead of adding more pods, VPA adjusts the CPU and Memory limits of your existing pods based on actual usage.
- KEDA (Kubernetes Event-driven Autoscaling): Often used instead of standard HPA for complex triggers. It can scale pods to zero and back up based on external events like AWS SQS queue depth, Kafka lag, or Cron schedules.
- GitOps/CD Pipelines: Sometimes scaling is "automated" via external CI/CD tools (like ArgoCD) that update the replica count in your git repo based on specific triggers or schedules rather than in-cluster metrics.
- HPA (The Demand): Triggers when CPU/Memory usage spikes, creating "Pending" pods that cannot fit on current nodes.
- Karpenter (The Supply): Sees those "Pending" pods and immediately provisions new EC2 instances to accommodate them.
- Buffer Time: Setting the HPA to 60% (instead of 90%) ensures you start scaling before pods are overwhelmed, giving Karpenter ~60 seconds to join new nodes to the cluster.
- Just-in-Time Nodes: Unlike the old Cluster Autoscaler, Karpenter doesn't wait for "Node Groups" to update; it calls the EC2 Fleet API directly to get exactly what your pending pods need.
- Automatic Cleanup: When the spike ends, HPA reduces pod counts. Karpenter's consolidationPolicy then notices the nodes are empty and terminates them to stop AWS billing.
- The Trigger: The HPA controller calculates the average CPU usage across all currently running pods in that deployment.
- The Action: If the average usage exceeds 700m, the HPA will add more pods to spread the load and bring that average back down to 700m.
- AverageUtilization (Percentage): Requires you to have resources.requests defined. It scales based on "percentage of what I asked for."
- AverageValue (Raw Number): Does not technically require a request baseline to function. It scales based on "absolute CPU consumed." This is useful if your app has a hard performance limit (e.g., "This app starts lagging if it hits 0.7 cores") regardless of what the pod's requested limit is.
Horizontal Pod Autoscaler and Upgrading Kubernetes version of the cluster
- autoscaling/v2 is now GA (General Availability): As of Kubernetes v1.23, the autoscaling/v2 API version is stable and generally available.
- Removal of v2beta2: The autoscaling/v2beta2 version was removed in v1.26. If your manifests still use this version, they will fail to apply or update in clusters v1.26 and newer.
- Manifest Updates: You must update the apiVersion in your YAML files. Note that fields like targetAverageUtilization in beta versions were replaced by a more structured target block in the stable v2 API.
- Metrics Server Compatibility: Ensure your Metrics Server version is compatible with your new Kubernetes version. Without it, HPA cannot fetch CPU or memory data, and scaling will fail.
- Custom Metrics Adapters: if you use custom metrics (e.g., via Prometheus), ensure your Prometheus Adapter supports the new Kubernetes API version, as some older adapters may still attempt to call removed API endpoints.
- Configurable Scaling Behavior: Introduced in v1.18 and matured in later versions, the behavior field allows you to set a stabilizationWindowSeconds for scale-up and scale-down independently. This is essential for preventing "flapping" (rapidly scaling up and then down).
- Configurable Tolerance: In very recent versions (e.g., v1.33), you can now fine-tune the 10% default tolerance. Previously, HPA would only act if the metric differed by more than 10%; you can now adjust this for more sensitive or coarser scaling needs.
- Audit Before Upgrading: Use tools like Kube-no-trouble (kubent) or Pluto to find resources using deprecated HPA APIs.
- Dry Runs: Run kubectl apply --dry-run=client on your HPA manifests against the target cluster version to catch schema errors before they impact production.
- Monitor Events: After upgrading, watch HPA events using kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler to ensure it is still successfully fetching metrics and making decisions.
- Seamless Conversion: The Kubernetes API server can convert between these versions, allowing you to use kubectl get hpa <name> -o yaml --output-version=autoscaling/v2 to view HPAs in the new format.
- Manifest Updates: While conversion is possible, you must update your CI/CD pipelines and YAML manifests to use autoscaling/v2 before upgrading to v1.26 to prevent errors.
- Behavior Block: The behavior block remains the same structurally, but using the v2 API is required for long-term support.


