Kubenatives
Subscribe
Sign in
Home
Notes
Archive
About
Production Runbook: GPU Pod Stuck in Pending
Debug runbook for GPU pods stuck in Pending on Kubernetes. GPU Operator failures, scheduling filters, MIG config, capacity planning, and prevention…
Mar 7
•
Sharon Sahadevan
How Kubernetes Schedules GPUs: Device Plugins, MIG, and Time-Slicing
Kubernetes treats a $30K A100 like a CPU core as a simple integer. Here’s what actually happens when you request nvidia.com/gpu: 1 — and how to stop…
Mar 6
•
Sharon Sahadevan
3
2
2
What Actually Happens Inside the Kubernetes Control Plane
What every production engineer should understand about the API server, etcd, scheduler, and controller manager, and why it matters when things break at…
Feb 27
•
Sharon Sahadevan
5
1
GPU Infrastructure Explained
Everything You Need to Know as a DevOps Engineer Moving into AI
Feb 12
•
Sharon Sahadevan
4
1
What is MCP?
The Universal Adapter for AI Tools
Jan 9
•
Sharon Sahadevan
2
1
Most Popular
View all
How I Solved a $50K Certificate Outage in 15 Minutes Using OSI Layers
Jul 22, 2025
•
Sharon Sahadevan
7
The OSI Model: Not Academic BS - Here's Why It Matters in Production
Jul 17, 2025
•
Sharon Sahadevan
15
3
DevOps to MLOps
Dec 16, 2025
•
Sharon Sahadevan
8
1
SSL/TLS in Kubernetes: The Complete Mental Model
Jul 16, 2025
•
Sharon Sahadevan
13
2
Latest
Top
Discussions
What the API Server Actually Does
Auth, admission control, watch streams — the request lifecycle that runs your entire cluster
Dec 28, 2025
•
Sharon Sahadevan
4
How Kubernetes Uses etcd: The Cluster's Source of Truth
The Brain Behind Your Cluster
Dec 26, 2025
•
Sharon Sahadevan
6
1
Kubernetes Networking: The Production Reality
What they don't teach you in tutorials (but you'll learn at 3am in production)
Dec 20, 2025
•
Sharon Sahadevan
7
3
DevOps to MLOps
The Essential Tools and Concepts You Must Master
Dec 16, 2025
•
Sharon Sahadevan
8
1
The Concepts Every DevOps Engineer Must Master
And Why Tools Don’t Matter
Dec 13, 2025
•
Sharon Sahadevan
6
1
Encoding vs. Encryption vs. Tokenization: What Every Engineer Should Know
Encoding: Translation, Not Protection
Dec 11, 2025
•
Sharon Sahadevan
2
MCP: The Protocol That Lets AI Talk to Your Kubernetes Cluster
Model Context Protocol (MCP) is an open protocol developed by Anthropic
Nov 26, 2025
•
Sharon Sahadevan
4
Data Lake vs Data Warehouse vs Lakehouse The Definitive, Practical Guide
Data architecture breakdown: when each model wins (and when it doesn’t)
Oct 20, 2025
•
Sharon Sahadevan
4
Scaling Patterns: Your Guide to Vertical, Horizontal & Functional Sharding
How to Scale Your Database and Infrastructure: Master Vertical, Horizontal, and Functional Sharding Patterns.
Oct 12, 2025
•
Sharon Sahadevan
4
See all
Kubenatives
Production Kubernetes for ML/AI workloads: GPU infrastructure, control plane internals, and model serving patterns for engineers running inference at scale.
Subscribe
Recommendations
AlgoMaster Newsletter
Ashish Pratap Singh
The System Design Newsletter
Neo Kim
ByteByteGo Newsletter
Alex Xu
Kubenatives
Subscribe
About
Archive
Recommendations
Sitemap
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts