Kubenatives
Subscribe
Sign in
Home
Notes
Courses
Archive
About
Production Kubernetes Debugging: A Systematic Framework
A systematic framework for debugging Kubernetes in production. Five layers from application to hardware, with the exact commands for each layer.
Apr 24
•
Sharon Sahadevan
1
1
Production Runbook: vLLM OOMKilled Recovery
When your inference pod dies mid-request with exit code 137. What to check, what to fix, and how to stop it from happening again.
Apr 22
•
Sharon Sahadevan
Ajay on why most IDPs fail (workshop this Saturday)
A short Q&A with Ajay Chankramath on when teams are ready for an IDP, how AI workloads break the standard patterns, and a workshop worth your Saturday.
Apr 21
•
Sharon Sahadevan
2
Service Mesh Debugging: When Istio Breaks Your Inference Pipeline
You installed Istio for mTLS and traffic management. Now your vLLM pods take 30 seconds to respond. Here is what went wrong and how to fix it.
Apr 20
•
Sharon Sahadevan
MIG vs Time-Slicing vs MPS: Which GPU Sharing Strategy and When
MIG partitions GPUs physically. Time-Slicing takes turns. MPS runs kernels in parallel. When to use each GPU sharing strategy on Kubernetes.
Apr 17
•
Sharon Sahadevan
1
Most Popular
View all
How I Solved a $50K Certificate Outage in 15 Minutes Using OSI Layers
Jul 22, 2025
•
Sharon Sahadevan
7
The OSI Model: Not Academic BS - Here's Why It Matters in Production
Jul 17, 2025
•
Sharon Sahadevan
15
3
Architecture Template: vLLM Production Deployment on Kubernetes
Mar 14
•
Sharon Sahadevan
4
DevOps to MLOps
Dec 16, 2025
•
Sharon Sahadevan
9
1
Latest
Top
Discussions
I Built the GPU Infrastructure Course I Wished Existed
What most engineers miss below the application layer
Apr 15
•
Sharon Sahadevan
2
etcd Debugging Guide: When Your Cluster Starts Losing Its Memory
The 5 ways etcd breaks in production Kubernetes, the metrics that predict each failure, and the commands to fix them before your cluster goes read-only.
Apr 10
•
Sharon Sahadevan
2
1
vLLM vs Triton vs KServe: Choosing Your Model Serving Stack on Kubernetes
vLLM, Triton, and KServe operate at different layers. Here's what each one does, when to use it, and how to combine them for production model serving on…
Apr 3
•
Sharon Sahadevan
1
Production Runbook: vLLM OOM Debugging
Your vLLM pod just crashed with OOMKilled. Here is how to find the cause and prevent it from happening again.
Mar 27
•
Sharon Sahadevan
1
1
How vLLM Serves Models on Kubernetes
PagedAttention, continuous batching, and why your first deployment will probably OOM.
Mar 27
•
Sharon Sahadevan
3
2
Production Runbook: etcd Backup and Restore
The step-by-step procedure for backing up and restoring etcd. Every command, every validation check, every gotcha.
Mar 22
•
Sharon Sahadevan
1
NVIDIA GPU Operator on Kubernetes: What It Actually Does Under the Hood
It’s not one componeIt is not one component. It is eight. Most engineers only know about one of them.nt. It’s seven — and most engineers only know about…
Mar 20
•
Sharon Sahadevan
2
Architecture Template: vLLM Production Deployment on Kubernetes
Copy, configure, deploy. Every YAML file you need to run vLLM in production with monitoring, autoscaling, and model caching.
Mar 14
•
Sharon Sahadevan
4
Stacked vs External etcd: The Production Decision Nobody Explains
Why kubeadm’s default isn’t what you’ll find in production — and when it actually matters.
Mar 13
•
Sharon Sahadevan
1
1
See all
Kubenatives
Production Kubernetes for ML/AI workloads: GPU infrastructure, control plane internals, and model serving patterns for engineers running inference at scale.
Subscribe
Recommendations
ByteByteGo Newsletter
Alex Xu
The System Design Newsletter
Neo Kim
AlgoMaster Newsletter
Ashish Pratap Singh
Kubenatives
Subscribe
About
Archive
Recommendations
Sitemap
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts