My Public Notepad: Kubernetes Scheduling

Pod scheduling is controlled by pod scheduling constraints section of the Kubernetes pod/deployment configuration which can be found in Kubernetes manifest (YAML) for resources like:

Deployment
StatefulSet
Pod
DaemonSet
Job/CronJob

Kubernetes scheduling mechanisms:

Tolerations
Node Selectors
Node Affinity
Node Affinity
Pod Affinity/Anti-Affinity
Taints (node-side)
Priority and Preemption
Topology Spread Constraints
Resource Requests/Limits
Custom Schedulers
Runtime Class

Example:

tolerations:

- key: "karpenter/elastic"

operator: "Exists"

effect: "NoSchedule"

nodeSelector:

karpenter-node-pool: elastic

node.kubernetes.io/instance-type: m7g.large

karpenter.sh/capacity-type: "on-demand"

Tolerations

Specify what node taints it can tolerate.

tolerations:

- key: "karpenter/elastic"

operator: "Exists"

effect: "NoSchedule"

Allows the pod to be scheduled on nodes with the taint karpenter/elastic:NoSchedule.

Without this toleration, the pod would be repelled from those nodes.

operator: "Exists" means it tolerates the taint regardless of its value.

Karpenter applies the taint karpenter/elastic:NoSchedule to nodes in the "elastic" pool. This taint acts as a gatekeeping mechanism - it says: "Only pods that explicitly tolerate this taint can schedule here". By default, most pods CANNOT schedule on these nodes (they lack the toleration). Our pod explicitly opts in with the toleration, saying "I'm allowed on elastic nodes".

Why This Pattern?

This is actually a common workload isolation strategy:

Regular pods (no toleration)

↓

❌ BLOCKED from elastic nodes

✅ Schedule on general-purpose nodes

Elastic workload pods (with toleration)

↓

✅ CAN schedule on elastic nodes

✅ Can also schedule elsewhere (unless nodeSelector restricts)

Real-World Use Case:

# Elastic nodes are tainted to reserve them for specific workloads

# General traffic shouldn't land here accidentally

# Your pod says: "I'm an elastic workload, let me in"

tolerations:

- key: "karpenter/elastic"

operator: "Exists"

effect: "NoSchedule"

# PLUS you add nodeSelector to say: "And I ONLY want elastic nodes"

nodeSelector:

karpenter-node-pool: elastic

The Karpenter Perspective

Karpenter knows the node state perfectly. The taint isn't about node health—it's about reserving capacity for specific workloads. This prevents:

Accidental scheduling of non-elastic workloads
Resource contention
Cost inefficiency (elastic nodes might be expensive/specialized)

Think of it like a VIP section: the velvet rope (taint) keeps everyone out except those with a pass (toleration).

Node Selector

nodeSelector:

karpenter-node-pool: elastic

node.kubernetes.io/instance-type: m7g.large

karpenter.sh/capacity-type: "on-demand"

Requires the pod to run only on nodes matching ALL these labels:

Must be in the "elastic" Karpenter node pool
Must be an AWS m7g.large instance (ARM-based Graviton3)
Must be on-demand (not spot instances; karpenter.sh/capacity-type can also have value "spot")

What This Means

This pod is configured to run on dedicated elastic infrastructure managed by Karpenter (a Kubernetes node autoscaler), specifically targeting:

ARM-based instances (m7g = Graviton)
On-demand capacity (predictable, no interruptions)
A specific node pool for workload isolation

This is common for workloads that need consistent performance or have specific architecture requirements.

Node Affinity

More flexible than nodeSelector with support for soft/hard requirements:

affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution: # Hard requirement

nodeSelectorTerms:

- matchExpressions:

- key: node.kubernetes.io/instance-type

operator: In

values: ["m7g.large", "m7g.xlarge"]

preferredDuringSchedulingIgnoredDuringExecution: # Soft preference

- weight: 100

preference:

matchExpressions:

- key: topology.kubernetes.io/zone

operator: In

values: ["us-east-1a"]

Pod Affinity/Anti-Affinity

Schedule pods based on what other pods are running:

affinity:

podAffinity: # Schedule NEAR certain pods

requiredDuringSchedulingIgnoredDuringExecution:

- labelSelector:

matchLabels:

app: cache

topologyKey: kubernetes.io/hostname

podAntiAffinity: # Schedule AWAY from certain pods

preferredDuringSchedulingIgnoredDuringExecution:

- weight: 100

podAffinityTerm:

labelSelector:

matchLabels:

app: my-app

topologyKey: topology.kubernetes.io/zone

Taints (node-side)

Complement to tolerations, applied to nodes:

kubectl taint nodes node1 dedicated=gpu:NoSchedule

Priority and Preemption

Control which pods get scheduled first and can evict lower-priority pods:

priorityClassName: high-priority

Topology Spread Constraints

Distribute pods evenly across zones, nodes, or other topology domains:

topologySpreadConstraints:

- maxSkew: 1

topologyKey: topology.kubernetes.io/zone

whenUnsatisfiable: DoNotSchedule

labelSelector:

matchLabels:

app: my-app

Resource Requests/Limits

Influence scheduling based on available resources:

resources:

requests:

memory: "64Mi"

cpu: "250m"

limits:

memory: "128Mi"

cpu: "500m"

Custom Schedulers

You can even specify a completely different scheduler:

schedulerName: my-custom-scheduler

Runtime Class

For specialized container runtimes (like gVisor, Kata Containers):

runtimeClassName: gvisor

Each mechanism serves different use cases—nodeSelector is simple but rigid, while affinity rules and topology constraints offer much more flexibility for complex scheduling requirements.

My Public Notepad

Pages

Wednesday, 7 January 2026

Kubernetes Scheduling

Tolerations

Node Selector

Node Affinity

Pod Affinity/Anti-Affinity

Taints (node-side)

Priority and Preemption

Topology Spread Constraints

Resource Requests/Limits

Custom Schedulers

Runtime Class

No comments:

Post a Comment