DevOps Coding Challenge #6: Kubernetes Pod Restarter
Level: Intermediate
Languages: Python or Go
Focus: Kubernetes API, Automation, Production Troubleshooting
Problem Statement
You’re on-call. Pods are stuck in a CrashLoopBackOff
or Error
state.
Instead of manually restarting them, automate it.
Your task:
Write a script (in Python or Go) that connects to a Kubernetes cluster and restarts pods in a given namespace, but only if:
The pod has been in a bad state (
CrashLoopBackOff
orError
)The condition has persisted for more than 10 minutes
The pod name matches a given pattern (e.g.,
"api"
)
Core Requirements
Accept these inputs via CLI:
--namespace
--pattern
--threshold
(in minutes)
Restart the pod using the Kubernetes API (delete it; let the ReplicaSet recreate it)
Print/log:
Which pods were deleted
Why they were deleted
Bonus Features
Add a
--dry-run
flag to simulate deletion without taking actionAdd a summary output like:
Checked 18 pods
Restarted 5 pods
Skipped 13 pods
Tech Suggestions
If using Python:
Use the
kubernetes
client (pip install kubernetes
)Use
argparse
for CLIUse
datetime
to calculate age
If using Go:
Use
client-go
or controller-runtimeUse Cobra or the
flag
package for CLIWatch for
containerStatuses
under.status.containerStatuses[*].state
Evaluation Criteria
Correct filtering and logic
Clean code structure
Proper use of Kubernetes API
Bonus: dry-run logic, helpful logging, CLI flags