Kubernetes scheduling constraints

Affinity and anti-affinity rules allow you to fine-tune your Kubernetes deployments, optimizing resource utilization and enhancing reliability.

Pod Affinity

  • Definition: Pod affinity is used to express scheduling constraints based on characteristics of candidate Nodes and existing Pods.
  • Purpose: It encourages Pods to be colocated on the same Node if they need to communicate frequently over the network.
  • Example: Imagine a microservices architecture where two Pods,ServiceAandServiceB, interact frequently. You can set up pod affinity so that bothServiceAandServiceBprefer to run on the same Node. This enhances communication efficiency.
  • Description: The affinity rule ensures that Pods with a specific label will be scheduled onto a Node that already hosts a Pod with the same label.

This ensures that allnginxPods are scheduled on the same Node based on the hostname label.

apiVersion: apps/v1kind: Deploymentmetadata:  name: nginx-deploymentspec:  replicas: 3  selector:    matchLabels:      app: nginx  template:    metadata:      labels:        app: nginx    spec:      affinity:        podAffinity:          requiredDuringSchedulingIgnoredDuringExecution:            - labelSelector:                matchExpressions:                  - key: app                    operator: In                    values:                      - nginx              topologyKey: "kubernetes.io/hostname"      containers:        - name: nginx          image: nginx

Pod Anti-Affinity

  • Definition: Pod anti-affinity discourages scheduling Pods onto Nodes that already have Pods with certain labels.
  • Purpose: It helps distribute workloads across different Nodes, promoting fault tolerance and resilience.
  • Example: Consider a scenario where you have two Pods,FrontendandBackend, serving a web application. You can set up pod anti-affinity so thatFrontendandBackendavoid running on the same Node. This way, if one Node fails, the other Node can still handle requests.
  • Description: The anti-affinity rule ensures that Pods with a specific labelprefer not to be scheduled on a Node that already hosts a Pod with the same label.

This ensures that no twonginxPods are scheduled on the same Node based on the hostname label.

apiVersion: apps/v1kind: Deploymentmetadata:  name: nginx-deploymentspec:  replicas: 3  selector:    matchLabels:      app: nginx  template:    metadata:      labels:        app: nginx    spec:      affinity:        podAntiAffinity:          requiredDuringSchedulingIgnoredDuringExecution:            - labelSelector:                matchExpressions:                  - key: app                    operator: In                    values:                      - nginx              topologyKey: "kubernetes.io/hostname"      containers:        - name: nginx          image: nginx

Node Affinity

  • Definition: Node affinity constrains which Nodes can receive a Pod by matching labels on those Nodes.
  • Purpose: It allows you to specify an affinity toward a group of Nodes based on their labels.
  • Example: Suppose you have a set of high-memory Nodes labeled asmemory=high. You want to run memory-intensive Pods on these Nodes. You can define node affinity to ensure that Pods with the labelmemory=highare scheduled on those specific Nodes.
  • Description: Node affinity acts as a preference, indicating that the scheduler should use a Node with the specified characteristics if available.

This ensures that thenginxPod is scheduled only on a Node with thedisktype=ssdlabel.

apiVersion: v1kind: Podmetadata:  name: nginxspec:  affinity:    nodeAffinity:      requiredDuringSchedulingIgnoredDuringExecution:        nodeSelectorTerms:          - matchExpressions:              - key: disktype                operator: In                values:                  - ssd  containers:    - name: nginx      image: nginx

Node Anti-Affinity

  • Definition: Node anti-affinity discourages scheduling Pods onto Nodes that already have Pods with specific labels.
  • Purpose: It promotes workload distribution across different Nodes, preventing resource bottlenecks.
  • Example: Imagine a scenario where you have Pods performing CPU-intensive computations. You can set up node anti-affinity to prevent these Pods from running on the same Node, ensuring better resource utilization.
  • Description: Node anti-affinity acts as a repelling rule, making it less probable for Pods to be scheduled on Nodes with the specified label.

This ensures that thenginxPod avoids Nodes with thegpu=truelabel.

apiVersion: v1kind: Podmetadata:  name: nginxspec:  affinity:    nodeAntiAffinity:      requiredDuringSchedulingIgnoredDuringExecution:        nodeSelectorTerms:          - matchExpressions:              - key: gpu                operator: In                values:                  - true  containers:    - name: nginx      image: nginx


requiredDuringSchedulingIgnoredDuringExecutioncan be broken into two parts:

  1. requiredDuringScheduling:

    • This component implies that apodshould be scheduled on anodeonly if it satisfies certain criteria. In other words, the node must meet specific conditions for the pod to be placed there during the initial scheduling process.
  2. IgnoredDuringExecution:

    • This part comes into play after a pod is alreadyscheduled and runningon a node.
    • If any changes occur in thelabelson that node during the pod’s execution (for example, due to an update), the existing pod shouldnot be evictedbased on these label changes.
    • Instead, onlynewly scheduled podsshould be required to match the updated criteria.

In summary,requiredDuringSchedulingIgnoredDuringExecutionensures that pods are initially placed on suitable nodes and avoids unnecessary evictions during runtime due to label changes on the node. It’s a way to maintain stability and predictability in your Kubernetes cluster.


topologyKey represents thekey of node labelsthat the scheduler uses to determine thetopology domainfor pod placement. For example, when usingpod affinity, the scheduler ensures that a pod is scheduled in the same domain (topology) as other pods that match a specific expression.

Common label options oftopologyKeyinclude:

  • topology.kubernetes.io/zone: Pods are scheduled in the same zone as other pods with matching labels.
  • kubernetes.io/hostname: Pods are scheduled on the same hostname as other pods with matching labels.
kind: Podmetadata:  name: with-pod-affinityspec:  affinity:    podAffinity:      requiredDuringSchedulingIgnoredDuringExecution:        - labelSelector:            matchExpressions:              - key: security                operator: In                values:                  - S1          topologyKey: topology.kubernetes.io/zone  containers:    - name: with-pod-affinity      image: k8s.gcr.io/pause:2.0


topologySpreadConstraintsallow you to control how Pods are distributed across your cluster among different failure domains such as regions, zones, nodes, and other user-defined topology domains. The goal is to achieve both high availability and efficient resource utilization.

For example, it can avoid single-node dependency,the YAML below deploys pods evenly to all nodes.

apiVersion: v1kind: Podmetadata:  name: example-podspec:  topologySpreadConstraints:    - maxSkew: 1      topologyKey: kubernetes.io/hostname      whenUnsatisfiable: DoNotSchedule

maxSkewhelps maintain a more even spread of pods, enhancing reliability and performance in your Kubernetes clusters. It defines the maximum allowed imbalance in the number of pods across topology domains.SetmaxSkewto1(meaning only one more pod than the average can be in any zone)

topologySpreadConstraints are ideal for hierarchical topologies (where nodes are spread across logical domains), while pod/node affinity is suitable for linear topologies (where all nodes are on the same level). topologySpreadConstraints provide more expressive control over pod scheduling across broader topological domains, and combining them with other affinity rules allows you to fine-tune your workload placement.

apiVersion: apps/v1kind: Deploymentmetadata: name: my-appspec: replicas: 5 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: – labelSelector: matchExpressions: – key: app operator: In values: – my-app topologyKey: kubernetes.io/hostname topologySpreadConstraints: – labelSelector: matchLabels: app: my-app maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway

In this example, the pods ofmy-appare spread across different zones(based on thetopology.kubernetes.io/zonelabel)

You may notice that there islabelSelector inside the topologySpreadConstraints, there’s difference between with and without the labelSelector.

1. WithlabelSelector:

  • When you define atopologySpreadConstraintswith alabelSelector, it allows you toselect specific Podsbased on their labels. These selected Pods are then counted to determine the number of Pods in their corresponding topology domain (such as nodes, zones, or other user-defined domains).
  • ThelabelSelectorhelps youcontrol the spreading behaviorof your Pods across different failure domains. You can ensure that Pods with specific labels are distributed evenly or according to your desired criteria.
  • For example, if you want to avoid running multiple Pods with the same label on a single node, you can use alabelSelectorto enforce this constraint.

2. WithoutlabelSelector:

  • When you omit thelabelSelector, the spreading behavior is calculated automatically based on other information (such as services, replication controllers, replica sets, or stateful sets) that the Pod belongs to.
  • In this case, the system determines how to spread the Pods across different domains without explicitly considering their labels.
  • It’s a moreautomatic approach, but it might not provide fine-grained control over the distribution of Pods based on specific labels.

Taints and Tolerations

Taintsare applied to nodes to mark them as “tainted” with specific keys and values. A tainted node will not schedule pods that do not have the correspondingtoleration.

Tolerationsare set on pods to allow them to tolerate specific taints. They define how long a pod can tolerate being scheduled on a tainted node.

Add taint to a node,taint effectNoSchedule.

kubectl taint nodes node1 key1=value1:NoSchedule

The allowed values for theeffectfield are:

  • NoExecute:This affects pods that are already running on the node as follows:
  • Pods that do not tolerate the taint are evicted immediately
  • Pods that tolerate the taint without specifyingtolerationSecondsin their toleration specification remain bound forever
  • Pods that tolerate the taint with a specifiedtolerationSecondsremain bound for the specified amount of time. After that time elapses, the node lifecycle controller evicts the Pods from the node.
  • NoSchedule:No new Pods will be scheduled on the tainted node unless they have a matching toleration. Pods currently running on the node arenotevicted.
  • PreferNoSchedule:PreferNoScheduleis a “preference” or “soft” version ofNoSchedule. The control plane willtryto avoid placing a Pod that does not tolerate the taint on the node, but it is not guaranteed.

Remove taint from a node.

kubectl taint nodes node1 key1=value1:NoSchedule-

Get the node’s taint info

kubectl get node/node1 -o json | jq .spec.taints

tolerationsusually used in pod or deployment declaration,in the YAML below,pods will tolerate the taint with key"hardware"and value"gpu"on the nodes where it is scheduled

apiVersion: apps/v1kind: Deploymentmetadata: name: my-deploymentspec: replicas: 3 template: metadata: labels: app: my-app spec: containers: – name: ai image: skynet:1997-08-29 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: – matchExpressions: – key: kubernetes.io/hostname operator: In values: [“big-gpu”, “expensive-gpu”] tolerations: – key: “hardware” value: “gpu” effect: “NoSchedule” tolerationSeconds: 3600

© 版权声明