Kubenatives
Subscribe
Sign in
Home
Notes
Courses
Archive
About
Latest
Top
Discussions
Network Policies in Practice: When Your Pods Cannot Talk to Each Other
You implemented network policies for security. Then DNS broke. Then inter-service communication broke. Here is how to do it without breaking everything.
Jun 5
•
Sharon Sahadevan
2
May 2026
Architecture Template: GPU Node Pool Setup
Complete YAML for a multi-tier GPU cluster with taints, tolerations, affinity, quotas, and priority classes. Copy, configure, deploy.
May 29
•
Sharon Sahadevan
1
GPU Node Pools: Taints, Tolerations, and Cost Isolation
Stop CPU workloads from landing on GPU nodes. Taints, tolerations, node affinity, resource quotas, and priority classes for multi-tier GPU clusters.
May 29
•
Sharon Sahadevan
2
1
LLMOps on Kubernetes: Patterns for Running LLMs in Production
Deploying the model is the easy part. Operating it in production is where most teams get stuck.
May 22
•
Sharon Sahadevan
3
Architecture Template: CoreDNS Debug ConfigMap
A production-ready CoreDNS configuration with logging, caching, and health checks for debugging DNS issues.
May 15
•
Sharon Sahadevan
1
Kubernetes DNS Troubleshooting: CoreDNS, ndots, and the 5-Second Timeout
Every DNS issue in Kubernetes traces back to one of 5 causes. Here is how to find which one in under 3 minutes.
May 15
•
Sharon Sahadevan
8
1
The Course Platform I Wish Existed When I Was Interviewing for DevOps Roles
GPU infrastructure, Kubernetes security, LLM operations, performance tuning, and identity systems, taught through real interview scenarios
May 9
•
Sharon Sahadevan
3
1
Why Your GPU Pods Are Pending: Debugging Kubernetes GPU Scheduling
Every reason a GPU pod gets stuck in Pending. Every debug command. Root cause in under 5 minutes.
May 8
•
Sharon Sahadevan
5
1
3-Node HA Setup: Quorum, Split-Brain, and Why the Math Matters
The number 3 is not arbitrary. It is the minimum that makes distributed consensus work.
May 1
•
Sharon Sahadevan
6
2
April 2026
Production Case Study: The vLLM Pod That Only OOMed at 3 AM
A 5-week investigation into a memory failure that ignored every rule we knew about LLM inference. The root cause changed how we think about KV cache…
Apr 29
•
Sharon Sahadevan
1
Production Kubernetes Debugging: A Systematic Framework
A systematic framework for debugging Kubernetes in production. Five layers from application to hardware, with the exact commands for each layer.
Apr 24
•
Sharon Sahadevan
1
2
Production Runbook: vLLM OOMKilled Recovery
When your inference pod dies mid-request with exit code 137. What to check, what to fix, and how to stop it from happening again.
Apr 22
•
Sharon Sahadevan
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts