Application Pod Disruption Budget (PDB) in Kubernetes

Monday, 9 March 2026

Application Pod Disruption Budget (PDB) in Kubernetes

A Pod Disruption Budget (PDB) in Kubernetes:

It's an API object
Sets for an application deployed in cluster:

Either

Minimum number of pods an application needs to function smoothly...

Maximum number of replicated pods that are down simultaneously...

...during voluntary disruptions (e.g., node upgrades, maintenance, draining)

Ensures high availability by guaranteeing a minimum number or percentage of pods remain active.

Key Aspects of PDBs:

In general, disruptions can be:

voluntary, such as maintenance operations or node scaling, or
involuntary, such as hardware failures or system crashes

Voluntary Focus: PDBs only protect against voluntary disruptions, such as kubectl drain or node repairs, not against involuntary, unavoidable failures.

Configuration: You define a PDB using either

either minAvailable

minimum pods that must run

or maxUnavailable

maximum pods allowed to be down
PDB configuration setting defining the maximum number of pods that can be voluntarily taken down simultaneously.

Use Case: Ideal for quorum-based applications (e.g., Elasticsearch, Zookeeper) to ensure quorum is never lost during node maintenance.

Mechanism: When a cluster administrator drains a node, the system checks the PDB. If removing a pod violates the budget, the action is delayed until enough replicas are available elsewhere.

Example PDB Configuration:

apiVersion: policy/v1

kind: PodDisruptionBudget

metadata:

spec:

minAvailable: 2 # At least 2 pods must remain running

selector:

matchLabels:

app: web-app

Name is usually set to refer to the application for which PDB is created ("web" in the example above).

Best Practice:

Use PDBs in conjunction with pod anti-affinity rules to ensure pods are spread across nodes.

How to check PDB in cluster?

Example:

% kubectl get pdb -A

NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE

clickhouse chi-clickhouse-ch N/A 1 1 132d

kube-system ws-cluster-autoscaler N/A 1 1 133d

kube-system coredns N/A 1 1 140d

kube-system ebs-csi-controller N/A 1 1 140d

kube-system karpenter N/A 1 1 139d

ALLOWED DISRUPTIONS:

dynamic value, constantly re-calculated by the disruption controller
the real-time status indicator
showing how many pods can currently be evicted without violating the set maxUnavailable or minAvailable constraints
The non-zero value means that the disruption controller has seen the pods, counted the matching pods, and updated the status of the PDB

To see the number of current and desired healthy pods (and how ALLOWED DISRUPTIONS is actually calculated) run this command (in the example below it was for karpenter application):

% kubectl get poddisruptionbudgets karpenter -n kube-system -o yaml

apiVersion: policy/v1

kind: PodDisruptionBudget

metadata:

annotations:

meta.helm.sh/release-name: karpenter

meta.helm.sh/release-namespace: kube-system

creationTimestamp: "2025-10-21T14:05:33Z"

generation: 1

labels:

app.kubernetes.io/instance: karpenter

app.kubernetes.io/managed-by: Helm

app.kubernetes.io/name: karpenter

app.kubernetes.io/version: 1.3.2

helm.sh/chart: karpenter-1.3.2

namespace: kube-system

resourceVersion: "2664456"

uid: 2b58340a-fd07-4567-95a9-2a43b5dd4bca

spec:

maxUnavailable: 1

selector:

matchLabels:

app.kubernetes.io/instance: karpenter

app.kubernetes.io/name: karpenter

status:

conditions:

- lastTransitionTime: "2025-10-27T10:52:01Z"

message: ""

observedGeneration: 1

reason: SufficientPods

status: "True"

type: DisruptionAllowed

currentHealthy: 2

desiredHealthy: 1

disruptionsAllowed: 1

expectedPods: 2

observedGeneration: 1

PDB and Rolling Update of Node Group

ALLOWED DISRUPTIONS = 1 is generally the safest and most standard setting for a rolling node group update (which can be triggered if we e.g. upgrade k8s version of aws_eks_node_group), especially for high-availability workloads.

Key Considerations for maxUnavailable: 1

Safety First: This setting ensures only one node is updated at a time (node - if all pod replicas are distribute across all nodes). This is ideal for maintaining quorum in stateful applications like databases (e.g., Consul or ZooKeeper) where losing multiple nodes

simultaneously could cause data loss or service failure.

Default Behavior: In Amazon EKS managed node groups, maxUnavailable defaults to 1 if not specified.

Resource Availability: For this to work, your cluster must have enough spare capacity (CPU/Memory) on the remaining nodes to host the pods evicted from the node being updated.

Update Speed: While safe, updating one node at a time is the slowest method. For very large clusters, you might consider a higher absolute number or a percentage (e.g., 10%) to speed up the process.

When 1 is NOT Enough

Blocking Drains: If you have a Pod Disruption Budget (PDB) where minAvailable equals your total replicas, the node drain will be blocked, and the update will stall because no pods can be legally moved.

Timeouts: Amazon EKS has a 15-minute timeout for draining pods. If pods take too long to terminate, the update may fail unless you have configured pre-stop hooks or adjusted your PDBs.

In-Depth Explanation

In a Pod Disruption Budget (PDB), the setting ALLOWED DISRUPTIONS = 1 (which results from either maxUnavailable: 1 or a minAvailable value that leaves one "slot" free) acts as a safety valve that synchronizes infrastructure changes with application health.

Here is exactly how it makes a rolling node group update safe:

1. It Hooks into the "Eviction API"

When a node group update begins, the automation doesn't just "kill" pods; it calls the Kubernetes Eviction API.

The Check: Before a pod is removed from a node being updated, the Eviction API checks your PDB.

The Logic: If ALLOWED DISRUPTIONS is 1, the API allows exactly one pod to be terminated. Once that pod is gone, the "Allowed Disruptions" counter drops to 0.

2. It Forces a Sequential Wait

This is the most critical safety feature. If a second pod eviction is requested while the first one is still being replaced:

The Block: The API sees ALLOWED DISRUPTIONS = 0 and rejects the eviction request.
The Wait: The node "drain" process pauses and retries. It will stay paused until the first pod's replacement is scheduled on a different node and passes its Readiness Probe.
The Reset: Only when the new pod is "Ready" does the ALLOWED DISRUPTIONS count return to 1, allowing the next pod to be safely evicted.

3. It Prevents "Brain-Dead" Automation

Without this PDB setting, a node group update might try to drain a node that holds all replicas of your app.

Without PDB: The node drains, kills all pods at once, and you have a total outage.
With ALLOWED DISRUPTIONS = 1: The automation is physically unable to kill the second pod until the first one is safely back online elsewhere, ensuring your app always has at least some capacity.

Summary of the "Safety Loop"

Step Action PDB State (Allowed Disruptions)

1 Node update starts; first pod eviction requested 1 → Eviction granted

2 First pod is terminating; replacement is starting 0 → All further evictions blocked

3 Replacement pod passes Readiness Probe 0 → 1 → Block lifted

4 Next pod eviction requested 1 → Process repeats

Important Note: If your app only has 1 replica total, a PDB with minAvailable: 1 will block the node update forever because it can never safely evict that single pod. You generally need at least 2 replicas for this safety mechanism to work without stalling your updates.

Can pod disruption budget be defined in deployment manifest?

No, a Pod Disruption Budget (PDB) cannot be defined directly inside a Deployment manifest. In Kubernetes, a PDB is a separate API resource (Kind: PodDisruptionBudget) that exists independently of the Deployment.

While they are separate resources, they are linked by a Label Selector. You define the PDB to target specific pods by matching the labels specified in your Deployment's pod template.

Why they are separate?

Decoupled Lifecycle: You can create, update, or delete a PDB without needing to redeploy your application or trigger a rolling update of your pods.
Multiple Controllers: A single PDB can protect pods across multiple Deployments, ReplicaSets, or StatefulSets as long as they share the same labels.
Administrative Control: Cluster administrators may manage PDBs separately from application developers to ensure cluster-wide stability during maintenance like node drains.

How to link them?

To make them work together, ensure the spec.selector.matchLabels in your PDB YAML matches the spec.template.metadata.labels in your Deployment YAML.

Deployment snippet:

metadata:

labels:

app: my-app

spec:

template:

metadata:

labels:

app: my-app

PDB snippet:

spec:

selector:

matchLabels:

app: my-app

minAvailable: 2

Here is an example of a complete multi-resource YAML that includes both the Deployment and the PDB:

apiVersion: apps/v1

kind: Deployment

metadata:

labels:

app: nginx

spec:

replicas: 3

selector:

matchLabels:

app: nginx

template:

metadata:

labels:

app: nginx

spec:

containers:

- name: nginx

image: nginx:latest

ports:

- containerPort: 80

---

apiVersion: policy/v1

kind: PodDisruptionBudget

metadata:

spec:

# The PDB finds the pods using this selector

selector:

matchLabels:

app: nginx

# Ensures at least 2 pods stay up during voluntary disruptions

minAvailable: 2

Key Takeaways:

The Link: The spec.selector.matchLabels in the PDB must exactly match the spec.template.metadata.labels in the Deployment.
Voluntary Disruptions: This PDB will protect your pods during actions like node drains or cluster upgrades, but it won't prevent "involuntary" issues like a hardware crash.

Using maxUnavailable is often better for autoscaling environments because it scales with your replica count. Instead of saying "I need X pods alive," you’re saying "I can afford to lose X pods."

Here is the updated manifest:

apiVersion: apps/v1

kind: Deployment

metadata:

spec:

replicas: 5

selector:

matchLabels:

app: nginx

template:

metadata:

labels:

app: nginx

spec:

containers:

- name: nginx

image: nginx:latest

---

apiVersion: policy/v1

kind: PodDisruptionBudget

metadata:

spec:

selector:

matchLabels:

app: nginx

# Only 1 pod can be taken down at a time during maintenance

maxUnavailable: 1

Pro Tips for maxUnavailable:

Absolute Number: Using 1 ensures that even if you have 100 replicas, Kubernetes will only drain one node at a time for this app.
Percentage: You can use a string like "25%" if you want the "allowed downtime" to grow or shrink as your Deployment scales.
The "One" Rule: You cannot use both minAvailable and maxUnavailable in the same PDB; you have to pick one strategy.

How to use a percentage to make the PDB more flexible for Horizontal Pod Autoscaling (HPA)?

Using percentages for a Pod Disruption Budget (PDB) is highly recommended when combined with a Horizontal Pod Autoscaler (HPA). This allows the disruption budget to scale dynamically as your application grows or shrinks based on traffic.

Dynamic YAML Example (Deployment + HPA + PDB)

This configuration ensures that no matter how many replicas the HPA creates, at least 80% of them will always remain available during maintenance.

apiVersion: apps/v1

kind: Deployment

metadata:

spec:

replicas: 2 # Initial count

selector:

matchLabels:

app: web

template:

metadata:

labels:

app: web

spec:

containers:

- name: nginx

image: nginx:latest

---

apiVersion: autoscaling/v2

kind: HorizontalPodAutoscaler

metadata:

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

minReplicas: 2

maxReplicas: 10

metrics:

- type: Resource

resource:

target:

type: Utilization

averageUtilization: 50

---

apiVersion: policy/v1

kind: PodDisruptionBudget

metadata:

spec:

selector:

matchLabels:

app: web

# Scales with HPA: if HPA scales to 10 pods, 8 must stay up.

minAvailable: 80%

Key Behaviours to Note:

Rounding Logic: Kubernetes always rounds up the result of the percentage to the nearest integer. For example, if you have 7 pods and set minAvailable: 50%, Kubernetes requires 4 pods to remain available.
Maintenance Blockers: Be careful with 100% or 0%. Setting minAvailable: 100% or maxUnavailable: 0% will completely block node drains, preventing cluster administrators from performing upgrades or maintenance.
Single Replica Limitation: If your HPA scales down to 1 replica, a PDB with minAvailable: 1 or 100% will prevent that single pod from being evicted, which can stall cluster maintenance.

How to troubleshoot a node drain that is stuck because of a PDB?

A kubectl drain typically gets stuck because the ALLOWED DISRUPTIONS for a specific PDB is 0. This means evicting even one more pod would violate your availability rules.

1. Identify the Blocking PDB

Run this command to see which PDB is preventing the drain:

kubectl get pdb -A

Look for any PDB where ALLOWED DISRUPTIONS is 0. If it’s 0, the eviction API will reject any attempt to move those pods, and the drain will wait indefinitely.

2. Common Reasons for "Allowed Disruptions: 0"

Too Few Replicas: You have minAvailable: 1 but only 1 pod is currently running.
Unhealthy Pods: Your PDB requires 3 pods to be healthy, but one is crashing or not yet "Ready." The PDB won't allow any more to be removed until that one is fixed.
Misconfiguration: You set minAvailable equal to your total replicas (e.g., 3 out of 3), meaning you've effectively told Kubernetes "never take a pod down".

3. How to Unstick the Drain

Depending on your situation, use one of these three strategies:

Strategy: Scale Up

Command: kubectl scale deployment <name> --replicas=<current+1>
Best For: Safest: Adds a new pod on a different node so the PDB is satisfied.

Strategy: Relax PDB

Command: kubectl edit pdb <pdb-name>
Best For: Cleanest: Temporarily lower minAvailable or increase maxUnavailable.

Strategy: Temporary Delete

Command: kubectl delete pdb <pdb-name>
Best For: Emergency: Delete the PDB, finish the drain, then re-apply the PDB later.

Warning: The "Force" Flag

Using kubectl drain <node> --force does not bypass PDBs. It only forces the deletion of pods not managed by a controller (like standalone pods) or pods using emptyDir. To truly bypass a PDB, you must either delete the PDB or manually kubectl delete pod the blocking pods

How to find all pods on a specific node that are currently protected by a PDB?

To find pods on a specific node that are protected by a Pod Disruption Budget (PDB), you need to cross-reference the pods on that node with the label selectors defined in your PDBs.

Option 1: The Quickest Way (Manual Check)

The most direct way to see all pods on a node and identify their labels is to use the describe node command.

kubectl describe node <node-name>

Look for the Non-terminated Pods section. You can then check if those pods' labels match your PDB selectors.

Option 2: Filtered Search (Best for Large Nodes)

If the node has many pods, use this command to list all pods on that node along with their labels:

kubectl get pods -A \

-o wide \

--field-selector spec.nodeName=<node-name> \

--show-labels

-A: Searches across all namespaces.

--field-selector spec.nodeName: Limits results to the specific node.

--show-labels: Displays the labels you need to compare against your PDB's spec.selector.

Option 3: Identifying the PDB Selector

To see exactly what labels a PDB is looking for, run:

kubectl get pdb <pdb-name> \

-o jsonpath='{.spec.selector.matchLabels}'

Any pod on your node that has these exact labels is protected by that PDB.

Summary of Useful Flags

Flag Purpose

----------------- ----------

-o wide Shows the NODE column to verify where pods are running.

--field-selector Filters the list server-side by the node name.

-l or --selector Filters pods by specific labels (e.g., kubectl get pods -l app=nginx).

---

My Public Notepad

Pages

Monday, 9 March 2026