Taints and Tolerations: The Kubernetes Bouncer System

A DevOps Engineer's Guide to Node Scheduling and Workload Placement

Jul 10, 2025

What You'll Learn Today

Master taints and tolerations - the most misunderstood concepts in Kubernetes. Understand how to control pod placement, create dedicated nodes, and implement advanced scheduling strategies that many DevOps engineers struggle with.

The Problem: Uncontrolled Pod Placement

Your GPU nodes are running regular web applications. Your database pods are scheduled on spot instances. Your high-priority workloads are mixed with batch jobs. Your development pods are consuming production node resources.

Kubernetes needs a way to say "this node is special" and "this pod is allowed on special nodes." That's exactly what taints and tolerations do.

The Simple Mental Model

Think of taints and tolerations like a nightclub bouncer system:

Taints = "No Entry" signs on nodes (like "VIP Only", "Members Only")
Tolerations = Special passes that pods carry (like "VIP Pass", "Member Card")
Default behavior = Pods without the right pass get rejected

The Basic Rule:

Node has taint → Pod needs matching toleration → Pod can be scheduled
Node has no taint → Any pod can be scheduled

How Taints and Tolerations Work

Taints (Applied to Nodes):

# Syntax: key=value:effect
kubectl taint nodes node1 gpu=true:NoSchedule
kubectl taint nodes node2 environment=production:NoExecute
kubectl taint nodes node3 dedicated=database:NoSchedule

Tolerations (Applied to Pods):

tolerations:
- key: "gpu"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"

The Three Effects:

NoSchedule - New pods won't be scheduled (existing pods stay)
PreferNoSchedule - Avoid scheduling if possible (soft constraint)
NoExecute - Evict existing pods AND prevent new ones

Real-World Examples

Example 1: Dedicated GPU Nodes

Problem: Expensive GPU nodes running non-GPU workloads

# Taint GPU nodes
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule
kubectl taint nodes gpu-node-2 gpu=true:NoSchedule

# GPU workload with toleration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-training
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ml-training
  template:
    metadata:
      labels:
        app: ml-training
    spec:
      tolerations:
      - key: "gpu"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      containers:
      - name: training
        image: tensorflow/tensorflow:latest-gpu
        resources:
          limits:
            nvidia.com/gpu: 1

Result: Only ML workloads with GPU tolerations can use GPU nodes.

Example 2: Production Environment Isolation

Problem: Development workloads accidentally running in production

# Taint production nodes
kubectl taint nodes prod-node-1 environment=production:NoSchedule
kubectl taint nodes prod-node-2 environment=production:NoSchedule

# Production workload
apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: production-api
  template:
    metadata:
      labels:
        app: production-api
    spec:
      tolerations:
      - key: "environment"
        operator: "Equal"
        value: "production"
        effect: "NoSchedule"
      containers:
      - name: api
        image: my-api:v1.0.0

Result: Only production workloads run on production nodes.

Example 3: Spot Instance Management

Problem: Critical workloads on unreliable spot instances

# Taint spot instances
kubectl taint nodes spot-node-1 node-type=spot:NoSchedule
kubectl taint nodes spot-node-2 node-type=spot:NoSchedule

# Batch job that tolerates spot instances
apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
spec:
  template:
    spec:
      tolerations:
      - key: "node-type"
        operator: "Equal"
        value: "spot"
        effect: "NoSchedule"
      containers:
      - name: processor
        image: data-processor:latest
      restartPolicy: OnFailure

Result: Only fault-tolerant workloads run on spot instances.

Advanced Taint and Toleration Patterns

Pattern 1: Multi-Tier Node Architecture

# Tier 1: High-performance nodes (SSD, high CPU)
kubectl taint nodes tier1-node-1 tier=high-performance:NoSchedule
kubectl taint nodes tier1-node-2 tier=high-performance:NoSchedule

# Tier 2: Standard nodes (no taint needed)

# Tier 3: Low-cost nodes (slower storage, lower CPU)
kubectl taint nodes tier3-node-1 tier=low-cost:NoSchedule
kubectl taint nodes tier3-node-2 tier=low-cost:NoSchedule

# Critical application on high-performance tier
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-database
spec:
  replicas: 3
  selector:
    matchLabels:
      app: critical-database
  template:
    metadata:
      labels:
        app: critical-database
    spec:
      tolerations:
      - key: "tier"
        operator: "Equal"
        value: "high-performance"
        effect: "NoSchedule"
      containers:
      - name: database
        image: postgres:13
        resources:
          requests:
            memory: "4Gi"
            cpu: "2000m"
          limits:
            memory: "8Gi"
            cpu: "4000m"

Pattern 2: Maintenance and Draining

# Drain node for maintenance
kubectl taint nodes worker-node-1 maintenance=true:NoExecute

# This will:
# 1. Evict all pods without tolerations
# 2. Prevent new pods from being scheduled

# Critical system pod that survives maintenance
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: system-monitor
spec:
  selector:
    matchLabels:
      app: system-monitor
  template:
    metadata:
      labels:
        app: system-monitor
    spec:
      tolerations:
      - key: "maintenance"
        operator: "Equal"
        value: "true"
        effect: "NoExecute"
      - key: "node.kubernetes.io/unschedulable"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: monitor
        image: system-monitor:latest

Pattern 3: Dedicated Database Nodes

# Create dedicated database nodes
kubectl taint nodes db-node-1 dedicated=database:NoSchedule
kubectl taint nodes db-node-2 dedicated=database:NoSchedule
kubectl taint nodes db-node-3 dedicated=database:NoSchedule

# Database with anti-affinity and tolerations
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-cluster
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "database"
        effect: "NoSchedule"
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - postgres
            topologyKey: kubernetes.io/hostname
      containers:
      - name: postgres
        image: postgres:13
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"

Built-in Kubernetes Taints

Kubernetes automatically applies certain taints:

Node Condition Taints:

# Automatically applied by Kubernetes
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/unschedulable:NoSchedule

Master Node Taints:

# Automatically applied to master nodes
node-role.kubernetes.io/master:NoSchedule
node-role.kubernetes.io/control-plane:NoSchedule

System Pod Tolerations:

# System pods typically have these tolerations
tolerations:
- operator: "Exists"
  effect: "NoExecute"
- operator: "Exists"
  effect: "NoSchedule"

Common Toleration Operators

1. Equal Operator (Exact Match)

tolerations:
- key: "gpu"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"

2. Exists Operator (Key Exists)

tolerations:
- key: "gpu"
  operator: "Exists"
  effect: "NoSchedule"

3. Wildcard Toleration (Tolerate Everything)

tolerations:
- operator: "Exists"

4. Specific Effect Toleration

tolerations:
- key: "node-type"
  operator: "Equal"
  value: "spot"
  effect: "NoSchedule"
- key: "node-type"
  operator: "Equal"
  value: "spot"
  effect: "NoExecute"
  tolerationSeconds: 3600  # Tolerate for 1 hour

Practical Management Commands

Viewing Taints:

# Show all node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Describe specific node
kubectl describe node <node-name>

# Show taints in JSON format
kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, taints: .spec.taints}'

Adding Taints:

# Add taint to node
kubectl taint nodes <node-name> key=value:effect

# Examples
kubectl taint nodes worker-1 environment=production:NoSchedule
kubectl taint nodes worker-2 gpu=true:NoSchedule
kubectl taint nodes worker-3 dedicated=database:NoExecute

Removing Taints:

# Remove specific taint
kubectl taint nodes <node-name> key=value:effect-

# Remove all taints for a key
kubectl taint nodes <node-name> key-

# Examples
kubectl taint nodes worker-1 environment=production:NoSchedule-
kubectl taint nodes worker-1 environment-

Checking Pod Tolerations:

# Show pod tolerations
kubectl get pod <pod-name> -o yaml | grep -A 10 tolerations

# Show all pods with tolerations
kubectl get pods -o json | jq '.items[] | select(.spec.tolerations != null) | {name: .metadata.name, tolerations: .spec.tolerations}'

Common Pitfalls and Solutions

Pitfall 1: Forgetting System Pods

Problem: System pods can't be scheduled after tainting nodes

# Wrong: This breaks system pods
kubectl taint nodes worker-1 dedicated=app:NoSchedule

Solution: Use node selectors or dedicated nodes for system workloads

# System pods need tolerations
tolerations:
- operator: "Exists"
  effect: "NoSchedule"

Pitfall 2: Inconsistent Taint Management

Problem: Manual taint management leads to inconsistencies

Solution: Use labels and automation

# Label nodes first
kubectl label nodes worker-1 node-type=gpu
kubectl label nodes worker-2 node-type=gpu

# Apply taints based on labels
kubectl get nodes -l node-type=gpu -o name | xargs -I {} kubectl taint {} gpu=true:NoSchedule

Pitfall 3: Not Understanding NoExecute

Problem: Existing pods get evicted unexpectedly

# This will evict existing pods immediately
kubectl taint nodes worker-1 maintenance=true:NoExecute

Solution: Use tolerationSeconds for graceful eviction

tolerations:
- key: "maintenance"
  operator: "Equal"
  value: "true"
  effect: "NoExecute"
  tolerationSeconds: 300  # 5 minutes grace period

Advanced Use Cases

Use Case 1: Canary Deployment Infrastructure

# Create canary nodes
kubectl taint nodes canary-node-1 deployment=canary:NoSchedule
kubectl taint nodes canary-node-2 deployment=canary:NoSchedule

# Canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      version: canary
  template:
    metadata:
      labels:
        app: myapp
        version: canary
    spec:
      tolerations:
      - key: "deployment"
        operator: "Equal"
        value: "canary"
        effect: "NoSchedule"
      containers:
      - name: app
        image: myapp:canary

Use Case 2: Compliance and Security Zones

# Create PCI-compliant nodes
kubectl taint nodes pci-node-1 compliance=pci:NoSchedule
kubectl taint nodes pci-node-2 compliance=pci:NoSchedule

# PCI-compliant workload
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processor
spec:
  replicas: 2
  selector:
    matchLabels:
      app: payment-processor
  template:
    metadata:
      labels:
        app: payment-processor
    spec:
      tolerations:
      - key: "compliance"
        operator: "Equal"
        value: "pci"
        effect: "NoSchedule"
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
      containers:
      - name: processor
        image: payment-processor:secure

Use Case 3: Multi-Tenant Resource Isolation

# Create tenant-specific nodes
kubectl taint nodes tenant-a-node-1 tenant=tenant-a:NoSchedule
kubectl taint nodes tenant-a-node-2 tenant=tenant-a:NoSchedule
kubectl taint nodes tenant-b-node-1 tenant=tenant-b:NoSchedule
kubectl taint nodes tenant-b-node-2 tenant=tenant-b:NoSchedule

# Tenant A workload
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tenant-a-app
  namespace: tenant-a
spec:
  replicas: 3
  selector:
    matchLabels:
      app: tenant-a-app
  template:
    metadata:
      labels:
        app: tenant-a-app
    spec:
      tolerations:
      - key: "tenant"
        operator: "Equal"
        value: "tenant-a"
        effect: "NoSchedule"
      containers:
      - name: app
        image: tenant-app:latest
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"

Combining with Other Kubernetes Features

Taints + Node Selectors + Affinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: high-performance-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: high-performance-app
  template:
    metadata:
      labels:
        app: high-performance-app
    spec:
      # Must tolerate high-performance taint
      tolerations:
      - key: "performance"
        operator: "Equal"
        value: "high"
        effect: "NoSchedule"
      
      # Must run on SSD nodes
      nodeSelector:
        storage: "ssd"
      
      # Prefer nodes with high CPU
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: cpu
                operator: In
                values:
                - "high"
      
      containers:
      - name: app
        image: high-performance-app:latest

Monitoring and Troubleshooting

Monitoring Taints:

# Check taint status
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints,STATUS:.status.conditions[?(@.type==\"Ready\")].status

# Monitor taint changes
kubectl get events --field-selector reason=Taint

# Check pod scheduling failures
kubectl get events --field-selector reason=FailedScheduling

Troubleshooting Common Issues:

# Why isn't my pod scheduling?
kubectl describe pod <pod-name>

# Check node capacity and taints
kubectl describe node <node-name>

# List all pods with tolerations
kubectl get pods -o json | jq '.items[] | select(.spec.tolerations != null) | {name: .metadata.name, node: .spec.nodeName, tolerations: .spec.tolerations}'

Best Practices

1. Use Meaningful Taint Keys

# Good: Descriptive keys
kubectl taint nodes worker-1 workload-type=database:NoSchedule
kubectl taint nodes worker-2 environment=production:NoSchedule

# Bad: Generic keys
kubectl taint nodes worker-1 special=true:NoSchedule

2. Document Your Taints

# Add labels to document taints
kubectl label nodes worker-1 taint-purpose="dedicated-database-node"
kubectl label nodes worker-1 taint-key="workload-type"
kubectl label nodes worker-1 taint-value="database"

3. Use Automation

# Terraform example
resource "kubernetes_node_taint" "gpu_nodes" {
  for_each = var.gpu_node_names
  
  node_name = each.value
  key       = "gpu"
  value     = "true"
  effect    = "NoSchedule"
}

4. Plan for System Workloads

# System DaemonSet with comprehensive tolerations
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: system-monitor
spec:
  selector:
    matchLabels:
      app: system-monitor
  template:
    metadata:
      labels:
        app: system-monitor
    spec:
      tolerations:
      - operator: "Exists"
        effect: "NoSchedule"
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "PreferNoSchedule"
      containers:
      - name: monitor
        image: system-monitor:latest

Action Items for This Week

Audit Current Cluster: Check existing taints and understand their purpose
Identify Use Cases: Find nodes that should be dedicated (GPU, production, etc.)
Implement Basic Taints: Start with environment separation (prod/dev)
Create Monitoring: Set up alerts for scheduling failures
Document Strategy: Create runbooks for taint management

Key Takeaways

Taints are "No Entry" signs on nodes; tolerations are "passes" for pods
Use NoSchedule for new workloads, NoExecute for immediate eviction
System pods need tolerations to survive node taints
Combine with node selectors and affinity for precise placement
Always plan for system workloads when implementing taints
Document and automate taint management for consistency

Next Week Preview

Next week, we'll explore Pod Priority and Preemption – how to ensure critical workloads get scheduled even when resources are scarce, and how priority classes work with taints and tolerations.

Kubenatives Newsletter

Discussion about this post