Kubenatives
Subscribe
Sign in
Home
Notes
Archive
About
Latest
Top
Discussions
etcd Debugging Guide: When Your Cluster Starts Losing Its Memory
The 5 ways etcd breaks in production Kubernetes, the metrics that predict each failure, and the commands to fix them before your cluster goes read-only.
10 hrs ago
•
Sharon Sahadevan
1
1
vLLM vs Triton vs KServe: Choosing Your Model Serving Stack on Kubernetes
vLLM, Triton, and KServe operate at different layers. Here's what each one does, when to use it, and how to combine them for production model serving on…
Apr 3
•
Sharon Sahadevan
1
March 2026
Production Runbook: vLLM OOM Debugging
Your vLLM pod just crashed with OOMKilled. Here is how to find the cause and prevent it from happening again.
Mar 27
•
Sharon Sahadevan
1
1
How vLLM Serves Models on Kubernetes
PagedAttention, continuous batching, and why your first deployment will probably OOM.
Mar 27
•
Sharon Sahadevan
3
2
Production Runbook: etcd Backup and Restore
The step-by-step procedure for backing up and restoring etcd. Every command, every validation check, every gotcha.
Mar 22
•
Sharon Sahadevan
1
NVIDIA GPU Operator on Kubernetes: What It Actually Does Under the Hood
It’s not one componeIt is not one component. It is eight. Most engineers only know about one of them.nt. It’s seven — and most engineers only know about…
Mar 20
•
Sharon Sahadevan
1
Architecture Template: vLLM Production Deployment on Kubernetes
Copy, configure, deploy. Every YAML file you need to run vLLM in production with monitoring, autoscaling, and model caching.
Mar 14
•
Sharon Sahadevan
3
Stacked vs External etcd: The Production Decision Nobody Explains
Why kubeadm’s default isn’t what you’ll find in production — and when it actually matters.
Mar 13
•
Sharon Sahadevan
1
Production Runbook: GPU Pod Stuck in Pending
Debug runbook for GPU pods stuck in Pending on Kubernetes. GPU Operator failures, scheduling filters, MIG config, capacity planning, and prevention…
Mar 7
•
Sharon Sahadevan
2
How Kubernetes Schedules GPUs: Device Plugins, MIG, and Time-Slicing
Kubernetes treats a $30K A100 like a CPU core as a simple integer. Here’s what actually happens when you request nvidia.com/gpu: 1 — and how to stop…
Mar 6
•
Sharon Sahadevan
3
2
2
February 2026
What Actually Happens Inside the Kubernetes Control Plane
What every production engineer should understand about the API server, etcd, scheduler, and controller manager, and why it matters when things break at…
Feb 27
•
Sharon Sahadevan
5
1
GPU Infrastructure Explained
Everything You Need to Know as a DevOps Engineer Moving into AI
Feb 12
•
Sharon Sahadevan
4
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts