Monday, 9 March 2026

Application Pod Disruption Budget (PDB) in Kubernetes



A Pod Disruption Budget (PDB) in Kubernetes:
  • It's an API object
  • Sets the minimum number of pods an application needs to function smoothly during disruptions
  • Limits the number of replicated pods that are down simultaneously during voluntary disruptions (e.g., node upgrades, maintenance, draining)
  • Ensures high availability by guaranteeing a minimum number or percentage of pods remain active. 

Key Aspects of PDBs:


In general, disruptions can be:
  • voluntary, such as maintenance operations or node scaling, or
  • involuntary, such as hardware failures or system crashes

Voluntary Focus: PDBs only protect against voluntary disruptions, such as kubectl drain or node repairs, not against involuntary, unavoidable failures.

Configuration: You define a PDB using either
  • either minAvailable 
    • minimum pods that must run
  • or maxUnavailable
    • maximum pods allowed to be down
    • PDB configuration setting defining the maximum number of pods that can be voluntarily taken down simultaneously.

Use Case: Ideal for quorum-based applications (e.g., Elasticsearch, Zookeeper) to ensure quorum is never lost during node maintenance.

Mechanism: When a cluster administrator drains a node, the system checks the PDB. If removing a pod violates the budget, the action is delayed until enough replicas are available elsewhere. 

Example PDB Configuration:


apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-pdb
spec:
  minAvailable: 2 # At least 2 pods must remain running
  selector:
    matchLabels:
      app: web-app


Best Practice: 


Use PDBs in conjunction with pod anti-affinity rules to ensure pods are spread across nodes.

How to check PDB in cluster?


Example:

% kubectl get pdb -A                          

NAMESPACE     NAME                        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
clickhouse    chi-clickhouse-ch           N/A             1                 1                     132d
kube-system   ws-cluster-autoscaler       N/A             1                 1                     133d
kube-system   coredns                     N/A             1                 1                     140d
kube-system   ebs-csi-controller          N/A             1                 1                     140d
kube-system   karpenter                   N/A             1                 1                     139d


ALLOWED DISRUPTIONS:
  • the real-time status indicator 
  • showing how many pods can currently be evicted without violating the set maxUnavailable or minAvailable constraints
  • The non-zero value means that the disruption controller has seen the pods, counted the matching pods, and updated the status of the PDB

To see the number of current and desired healthy pods (and how ALLOWED DISRUPTIONS is actually calculated):

% kubectl get poddisruptionbudgets karpenter -n kube-system -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  annotations:
    meta.helm.sh/release-name: karpenter
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2025-10-21T14:05:33Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: karpenter
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: karpenter
    app.kubernetes.io/version: 1.3.2
    helm.sh/chart: karpenter-1.3.2
  name: karpenter
  namespace: kube-system
  resourceVersion: "2664456"
  uid: 2b58340a-fd07-4567-95a9-2a43b5dd4bca
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: karpenter
      app.kubernetes.io/name: karpenter
status:
  conditions:
  - lastTransitionTime: "2025-10-27T10:52:01Z"
    message: ""
    observedGeneration: 1
    reason: SufficientPods
    status: "True"
    type: DisruptionAllowed
  currentHealthy: 2
  desiredHealthy: 1
  disruptionsAllowed: 1
  expectedPods: 2
  observedGeneration: 1


PDB and Rolling Update of Node Group

ALLOWED DISRUPTIONS = 1 is generally the safest and most standard setting for a rolling node group update, especially for high-availability workloads. 

Key Considerations for maxUnavailable: 1


Safety First: This setting ensures only one node is updated at a time. This is ideal for maintaining quorum in stateful applications like databases (e.g., Consul or ZooKeeper) where losing multiple nodes 
simultaneously could cause data loss or service failure.

Default Behavior: In Amazon EKS managed node groups, maxUnavailable defaults to 1 if not specified.

Resource Availability: For this to work, your cluster must have enough spare capacity (CPU/Memory) on the remaining nodes to host the pods evicted from the node being updated.

Update Speed: While safe, updating one node at a time is the slowest method. For very large clusters, you might consider a higher absolute number or a percentage (e.g., 10%) to speed up the process. 

When 1 is NOT Enough


Blocking Drains: If you have a Pod Disruption Budget (PDB) where minAvailable equals your total replicas, the node drain will be blocked, and the update will stall because no pods can be legally moved.

Timeouts: Amazon EKS has a 15-minute timeout for draining pods. If pods take too long to terminate, the update may fail unless you have configured pre-stop hooks or adjusted your PDBs.


---

No comments: