DevOps Coding Challenge #6: Kubernetes Pod Restarter
Level: Intermediate
Languages: Python or Go
Focus: Kubernetes API, Automation, Production Troubleshooting
Problem Statement
You’re on-call. Pods are stuck in a CrashLoopBackOff or Error state.
Instead of manually restarting them, automate it.
Your task:
Write a script (in Python or Go) that connects to a Kubernetes cluster and restarts pods in a given namespace, but only if:
The pod has been in a bad state (
CrashLoopBackOfforError)The condition has persisted for more than 10 minutes
The pod name matches a given pattern (e.g.,
"api")
Core Requirements
Accept these inputs via CLI:
--namespace--pattern--threshold(in minutes)
Restart the pod using the Kubernetes API (delete it; let the ReplicaSet recreate it)
Print/log:
Which pods were deleted
Why they were deleted
Bonus Features
Add a
--dry-runflag to simulate deletion without taking actionAdd a summary output like:
Checked 18 pods
Restarted 5 pods
Skipped 13 pods
Tech Suggestions
If using Python:
Use the
kubernetesclient (pip install kubernetes)Use
argparsefor CLIUse
datetimeto calculate age
If using Go:
Use
client-goor controller-runtimeUse Cobra or the
flagpackage for CLIWatch for
containerStatusesunder.status.containerStatuses[*].state
Evaluation Criteria
Correct filtering and logic
Clean code structure
Proper use of Kubernetes API
Bonus: dry-run logic, helpful logging, CLI flags


