A Pod Disruption Budget (PDB) in Kubernetes:
- It's an API object
- Sets the minimum number of pods an application needs to function smoothly during disruptions.
- Limits the number of replicated pods that are down simultaneously during voluntary disruptions (e.g., node upgrades, maintenance, draining)
- Ensures high availability by guaranteeing a minimum number or percentage of pods remain active.
Key Aspects of PDBs:
In general, disruptions can be:
- voluntary, such as maintenance operations or node scaling, or
- involuntary, such as hardware failures or system crashes
Voluntary Focus: PDBs only protect against voluntary disruptions, such as kubectl drain or node repairs, not against involuntary, unavoidable failures.
Configuration: You define a PDB using either
- either minAvailable
- minimum pods that must run
- or maxUnavailable
- maximum pods allowed to be down
- PDB configuration setting defining the maximum number of pods that can be voluntarily taken down simultaneously.
Use Case: Ideal for quorum-based applications (e.g., Elasticsearch, Zookeeper) to ensure quorum is never lost during node maintenance.
Mechanism: When a cluster administrator drains a node, the system checks the PDB. If removing a pod violates the budget, the action is delayed until enough replicas are available elsewhere.
Example PDB Configuration:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2 # At least 2 pods must remain running
selector:
matchLabels:
app: web-app
Best Practice:
Use PDBs in conjunction with pod anti-affinity rules to ensure pods are spread across nodes.
How to check PDB in cluster?
Example:
% kubectl get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
clickhouse chi-clickhouse-ch N/A 1 1 132d
kube-system ws-cluster-autoscaler N/A 1 1 133d
kube-system coredns N/A 1 1 140d
kube-system ebs-csi-controller N/A 1 1 140d
kube-system karpenter N/A 1 1 139d
ALLOWED DISRUPTIONS:
- the real-time status indicator
- showing how many pods can currently be evicted without violating the set maxUnavailable or minAvailable constraints
- The non-zero value means that the disruption controller has seen the pods, counted the matching pods, and updated the status of the PDB
To see the number of current and desired healthy pods (and how ALLOWED DISRUPTIONS is actually calculated):
% kubectl get poddisruptionbudgets karpenter -n kube-system -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
annotations:
meta.helm.sh/release-name: karpenter
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2025-10-21T14:05:33Z"
generation: 1
labels:
app.kubernetes.io/instance: karpenter
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: karpenter
app.kubernetes.io/version: 1.3.2
helm.sh/chart: karpenter-1.3.2
name: karpenter
namespace: kube-system
resourceVersion: "2664456"
uid: 2b58340a-fd07-4567-95a9-2a43b5dd4bca
spec:
maxUnavailable: 1
selector:
matchLabels:
app.kubernetes.io/instance: karpenter
app.kubernetes.io/name: karpenter
status:
conditions:
- lastTransitionTime: "2025-10-27T10:52:01Z"
message: ""
observedGeneration: 1
reason: SufficientPods
status: "True"
type: DisruptionAllowed
currentHealthy: 2
desiredHealthy: 1
disruptionsAllowed: 1
expectedPods: 2
observedGeneration: 1
PDB and Rolling Update of Node Group
ALLOWED DISRUPTIONS = 1 is generally the safest and most standard setting for a rolling node group update, especially for high-availability workloads.
Key Considerations for maxUnavailable: 1
Safety First: This setting ensures only one node is updated at a time. This is ideal for maintaining quorum in stateful applications like databases (e.g., Consul or ZooKeeper) where losing multiple nodes
simultaneously could cause data loss or service failure.
Default Behavior: In Amazon EKS managed node groups, maxUnavailable defaults to 1 if not specified.
Resource Availability: For this to work, your cluster must have enough spare capacity (CPU/Memory) on the remaining nodes to host the pods evicted from the node being updated.
Update Speed: While safe, updating one node at a time is the slowest method. For very large clusters, you might consider a higher absolute number or a percentage (e.g., 10%) to speed up the process.
When 1 is NOT Enough
Blocking Drains: If you have a Pod Disruption Budget (PDB) where minAvailable equals your total replicas, the node drain will be blocked, and the update will stall because no pods can be legally moved.
Timeouts: Amazon EKS has a 15-minute timeout for draining pods. If pods take too long to terminate, the update may fail unless you have configured pre-stop hooks or adjusted your PDBs.
In-Depth Explanation
In a Pod Disruption Budget (PDB), the setting ALLOWED DISRUPTIONS = 1 (which results from either maxUnavailable: 1 or a minAvailable value that leaves one "slot" free) acts as a safety valve that synchronizes infrastructure changes with application health.
Here is exactly how it makes a rolling node group update safe:
1. It Hooks into the "Eviction API"
When a node group update begins, the automation doesn't just "kill" pods; it calls the Kubernetes Eviction API.
The Check: Before a pod is removed from a node being updated, the Eviction API checks your PDB.
The Logic: If ALLOWED DISRUPTIONS is 1, the API allows exactly one pod to be terminated. Once that pod is gone, the "Allowed Disruptions" counter drops to 0.
2. It Forces a Sequential Wait
This is the most critical safety feature. If a second pod eviction is requested while the first one is still being replaced:
- The Block: The API sees ALLOWED DISRUPTIONS = 0 and rejects the eviction request.
- The Wait: The node "drain" process pauses and retries. It will stay paused until the first pod's replacement is scheduled on a different node and passes its Readiness Probe.
- The Reset: Only when the new pod is "Ready" does the ALLOWED DISRUPTIONS count return to 1, allowing the next pod to be safely evicted.
3. It Prevents "Brain-Dead" Automation
Without this PDB setting, a node group update might try to drain a node that holds all replicas of your app.
- Without PDB: The node drains, kills all pods at once, and you have a total outage.
- With ALLOWED DISRUPTIONS = 1: The automation is physically unable to kill the second pod until the first one is safely back online elsewhere, ensuring your app always has at least some capacity.
Summary of the "Safety Loop"
Step Action PDB State (Allowed Disruptions)
1 Node update starts; first pod eviction requested 1 → Eviction granted
2 First pod is terminating; replacement is starting 0 → All further evictions blocked
3 Replacement pod passes Readiness Probe 0 → 1 → Block lifted
4 Next pod eviction requested 1 → Process repeats
Important Note: If your app only has 1 replica total, a PDB with minAvailable: 1 will block the node update forever because it can never safely evict that single pod. You generally need at least 2 replicas for this safety mechanism to work without stalling your updates.
---
No comments:
Post a Comment