A Pod Disruption Budget (PDB) in Kubernetes:
- It's an API object
- Sets the minimum number of pods an application needs to function smoothly during disruptions.
- Limits the number of replicated pods that are down simultaneously during voluntary disruptions (e.g., node upgrades, maintenance, draining)
- Ensures high availability by guaranteeing a minimum number or percentage of pods remain active.
Key Aspects of PDBs:
In general, disruptions can be:
- voluntary, such as maintenance operations or node scaling, or
- involuntary, such as hardware failures or system crashes
Voluntary Focus: PDBs only protect against voluntary disruptions, such as kubectl drain or node repairs, not against involuntary, unavoidable failures.
Configuration: You define a PDB using either
- either minAvailable
- minimum pods that must run
- or maxUnavailable
- maximum pods allowed to be down
- PDB configuration setting defining the maximum number of pods that can be voluntarily taken down simultaneously.
Use Case: Ideal for quorum-based applications (e.g., Elasticsearch, Zookeeper) to ensure quorum is never lost during node maintenance.
Mechanism: When a cluster administrator drains a node, the system checks the PDB. If removing a pod violates the budget, the action is delayed until enough replicas are available elsewhere.
Example PDB Configuration:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2 # At least 2 pods must remain running
selector:
matchLabels:
app: web-app
Best Practice:
Use PDBs in conjunction with pod anti-affinity rules to ensure pods are spread across nodes.
How to check PDB in cluster?
Example:
% kubectl get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
clickhouse chi-clickhouse-ch N/A 1 1 132d
kube-system ws-cluster-autoscaler N/A 1 1 133d
kube-system coredns N/A 1 1 140d
kube-system ebs-csi-controller N/A 1 1 140d
kube-system karpenter N/A 1 1 139d
ALLOWED DISRUPTIONS:
- the real-time status indicator
- showing how many pods can currently be evicted without violating the set maxUnavailable or minAvailable constraints
- The non-zero value means that the disruption controller has seen the pods, counted the matching pods, and updated the status of the PDB
To see the number of current and desired healthy pods (and how ALLOWED DISRUPTIONS is actually calculated):
% kubectl get poddisruptionbudgets karpenter -n kube-system -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
annotations:
meta.helm.sh/release-name: karpenter
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2025-10-21T14:05:33Z"
generation: 1
labels:
app.kubernetes.io/instance: karpenter
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: karpenter
app.kubernetes.io/version: 1.3.2
helm.sh/chart: karpenter-1.3.2
name: karpenter
namespace: kube-system
resourceVersion: "2664456"
uid: 2b58340a-fd07-4567-95a9-2a43b5dd4bca
spec:
maxUnavailable: 1
selector:
matchLabels:
app.kubernetes.io/instance: karpenter
app.kubernetes.io/name: karpenter
status:
conditions:
- lastTransitionTime: "2025-10-27T10:52:01Z"
message: ""
observedGeneration: 1
reason: SufficientPods
status: "True"
type: DisruptionAllowed
currentHealthy: 2
desiredHealthy: 1
disruptionsAllowed: 1
expectedPods: 2
observedGeneration: 1
PDB and Rolling Update of Node Group
ALLOWED DISRUPTIONS = 1 is generally the safest and most standard setting for a rolling node group update, especially for high-availability workloads.
Key Considerations for maxUnavailable: 1
Safety First: This setting ensures only one node is updated at a time. This is ideal for maintaining quorum in stateful applications like databases (e.g., Consul or ZooKeeper) where losing multiple nodes
simultaneously could cause data loss or service failure.
Default Behavior: In Amazon EKS managed node groups, maxUnavailable defaults to 1 if not specified.
Resource Availability: For this to work, your cluster must have enough spare capacity (CPU/Memory) on the remaining nodes to host the pods evicted from the node being updated.
Update Speed: While safe, updating one node at a time is the slowest method. For very large clusters, you might consider a higher absolute number or a percentage (e.g., 10%) to speed up the process.
When 1 is NOT Enough
Blocking Drains: If you have a Pod Disruption Budget (PDB) where minAvailable equals your total replicas, the node drain will be blocked, and the update will stall because no pods can be legally moved.
Timeouts: Amazon EKS has a 15-minute timeout for draining pods. If pods take too long to terminate, the update may fail unless you have configured pre-stop hooks or adjusted your PDBs.
---
No comments:
Post a Comment