MCP: The Protocol That Lets AI Talk to Your Kubernetes Cluster
Model Context Protocol (MCP) is an open protocol developed by Anthropic
If you’ve been seeing “MCP” everywhere lately and wondering what the hype is about — this issue breaks it down.
We’ll cover what MCP actually is, why it matters for Kubernetes operators, and how it changes the way you interact with your clusters.
The Problem
Every AI coding assistant has the same limitation: it’s disconnected from your actual infrastructure.
Ask ChatGPT or Claude “why is my pod failing?” and you get a generic troubleshooting checklist:
Check if the image exists
Verify resource limits
Look at the events
Check node capacity
Useful? Sure. But it’s not looking at YOUR pod. YOUR cluster. YOUR actual problem.
You still have to run the commands yourself, copy-paste the output, and go back and forth until the AI has enough context to help.
MCP changes this.
What is MCP?
Model Context Protocol (MCP) is an open standard that connects AI assistants directly to external systems.
The simplest analogy: USB-C for AI.
Remember when every phone had a different charging cable? USB-C standardized that. One cable works with everything.
MCP does the same thing for AI integrations. One protocol that lets any AI assistant talk to any external system.
Here’s how it works:
You ask: “Why is my nginx pod failing?”
↓
AI Assistant (Claude, Cursor, Copilot)
↓
MCP Server (connected to your cluster)
↓
Kubernetes API Server
↓
AI Response: “Your nginx pod is failing because it’s trying to
pull from a private registry and the imagePullSecret is missing
in the production namespace. Here’s the error from the events...”
The AI isn’t guessing. It’s looking at your actual cluster and telling you what’s actually wrong.
What Does This Look Like in Practice?
Let’s compare the old way vs. the MCP way.
Scenario 1: Debugging a failing deployment
Without MCP:
You: “My deployment isn’t working”
AI: “Here are 10 things that could be wrong. Run kubectl describe deployment and kubectl get events and paste the output.”
You run commands, paste output, wait for response, run more commands, paste more output...
With MCP:
You: “My deployment isn’t working”
AI: “I checked your deployment. The nginx-app deployment in the production namespace has 0/3 replicas ready. The pods are failing because they’re requesting 2Gi of memory but your nodes only have 1.5Gi available. Either reduce the memory request or scale up your nodes.”
One question. Actual answer. No copy-paste dance.
Scenario 2: Cluster health check
Without MCP:
You: “Is anything wrong with my cluster?”
AI: “Run these 15 kubectl commands and tell me what you see.”
With MCP:
You: “Is anything wrong with my cluster?”
AI: “I found 3 issues:
Two pods in CrashLoopBackOff in the payments namespace
A PersistentVolumeClaim has been pending for 6 hours
One node is showing memory pressure
Want me to dig into any of these?”
Scenario 3: Quick questions
Without MCP:
You: “How many replicas does my web app have?”
AI: “Run kubectl get deployment web-app -o jsonpath=’{.spec.replicas}’“
With MCP:
You: “How many replicas does my web app have?”
AI: “Your web-app deployment has 3 replicas, all running.”
Real Examples You Can Try
Once MCP is set up, these are actual questions you can ask:
Troubleshooting:
“Why are my pods pending?”
“What’s consuming the most memory in my cluster?”
“Show me all pods that restarted in the last hour”
“Why can’t my frontend talk to my backend service?”
Quick lookups:
“What version of nginx am I running in production?”
“Which namespaces exist in my cluster?”
“List all services using LoadBalancer type”
“What environment variables are set in the auth deployment?”
Comparisons:
“What’s different between my staging and production deployments?”
“Which deployments don’t have resource limits set?”
“Are all my pods running the same image tag?”
Operations:
“Scale the api deployment to 5 replicas”
“Show me what would change if I apply this manifest”
“What would happen if I drain this node?”
The key difference: these aren’t hypothetical answers. The AI is querying your actual cluster and giving you real data.
Available Kubernetes MCP Servers
Three main options have emerged this year:
Red Hat’s kubernetes-mcp-server
Built in Go, talks directly to the Kubernetes API. No dependencies — you don’t need kubectl or Helm installed. Works with any Kubernetes cluster including OpenShift.
Azure’s mcp-kubernetes
Microsoft’s version with additional tooling support. Includes Helm operations and network policy visibility through Cilium/Hubble integration.
AWS EKS MCP Server
Purpose-built for EKS. Deep integration with AWS services and available as a managed preview directly in the AWS console.
All three accomplish the same core goal: letting your AI assistant query and manage your clusters through natural language.
How It Actually Works
Under the hood, the MCP server is doing what you’d do manually — just faster.
When you ask “why is my pod failing?”, the MCP server:
Lists pods to find which ones are failing
Gets the pod details and status
Pulls recent events for that pod
Checks the container logs
Looks at the node status if relevant
Then it synthesizes all of that into an answer.
It’s not magic. It’s automation of the investigative work you’d do anyway.
The AI still needs to understand Kubernetes to ask the right questions. MCP just gives it the ability to get answers from your actual infrastructure.
Production Considerations
A few things to think about before connecting AI to your production clusters:
Start with read-only access
All the MCP servers support a read-only mode. The AI can query anything but can’t modify resources. This is the safest way to start.
Use scoped permissions
Instead of giving the MCP server admin access, create a dedicated ServiceAccount with only the permissions it needs. Standard Kubernetes RBAC applies.
Audit logging still works
Every action through the MCP server goes through the standard Kubernetes API. Your existing audit logs capture all of it. You can see exactly what the AI queried or modified.
Consider environment separation
Run read-only for production, full access for dev. You can configure multiple MCP servers pointing to different clusters with different permission levels.
The Bigger Picture
MCP isn’t just for Kubernetes. The same protocol works for:
Databases (query your actual data)
Cloud providers (AWS, Azure, GCP resources)
Monitoring systems (Prometheus metrics, Grafana dashboards)
Git repositories
Internal APIs
The real power comes from combining them. An AI with access to your Kubernetes cluster AND your monitoring stack AND your cloud provider can correlate information across systems.
“Why is this pod slow?” becomes answerable with actual metrics, actual resource allocation, and actual infrastructure context — all in one conversation.
Should You Try This?
Good fit if you:
Spend time debugging across multiple kubectl commands
Manage clusters you didn’t set up (inherited infrastructure)
Want faster answers during incident response
Are onboarding engineers who are still learning kubectl
Maybe not ideal if you:
Need deterministic, repeatable automation (use standard pipelines)
Work in heavily regulated environments where AI access needs approval
Prefer the precision of explicit commands
The mental model: MCP adds a conversational layer on top of kubectl. It doesn’t replace the CLI — it gives you another interface to the same underlying APIs.
Getting Started
The setup is straightforward:
Download an MCP server (single binary, no dependencies)
Point it at your kubeconfig
Configure your AI client (Claude Desktop, Cursor, etc.) to use it
Start asking questions
Most people are up and running in under 10 minutes.
I’d recommend starting with a dev cluster in read-only mode. Get comfortable with the workflow before expanding access.
The Shift
This is part of a broader trend: AI moving from “advisor” to “operator.”
Today’s AI assistants give you suggestions based on training data. Tomorrow’s AI assistants will take actions based on real-time context from your actual systems.
MCP is the protocol making that possible.
The engineers who understand both the fundamentals AND how to leverage AI tooling effectively will have an advantage. Not because AI replaces expertise — but because it amplifies it.
That’s it for this issue. If you try MCP with your clusters, reply and let me know how it goes — I’m collecting real-world experiences for a follow-up.


