Kubenatives

Kubenatives

Architecture Template: vLLM Production Deployment on Kubernetes

Copy, configure, deploy. Every YAML file you need to run vLLM in production with monitoring, autoscaling, and model caching.

Sharon Sahadevan's avatar
Sharon Sahadevan
Mar 14, 2026
∙ Paid

This template gives you a complete production-ready vLLM deployment on Kubernetes. Not a tutorial. Not a demo. A set of YAML files that you can copy into your cluster and configure for your model.

Every file includes comments explaining why each setting exists and what to change for your workload.

What you get:

  • Namespace and RBAC

  • Hugging Face token Secret

  • Model cache PVC

  • vLLM Deployment with production settings

  • Service

  • HPA based on custom metrics

  • ServiceMonitor for Prometheus

  • PodDisruptionBudget


File 1: Namespace and RBAC

# namespace.yaml
# Separate namespace for inference workloads.
# Keeps GPU resource quotas and RBAC isolated from other workloads.
apiVersion: v1
kind: Namespace
metadata:
  name: inference
  labels:
    purpose: model-serving
---
# Optional: ResourceQuota to cap total GPU usage in this namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: gpu-quota
  namespace: inference
spec:
  hard:
    requests.nvidia.com/gpu: "8"    # Max 8 GPUs in this namespace
    limits.nvidia.com/gpu: "8"

File 2: Hugging Face Token Secret

# hf-secret.yaml
# Your Hugging Face token for downloading gated models (Llama, Mistral, etc.)
# Generate at: https://huggingface.co/settings/tokens
#
# Create with:
#   kubectl create secret generic hf-token \
#     --from-literal=token=hf_YOUR_TOKEN_HERE \
#     -n inference
#
# Or apply this file after base64 encoding your token:
apiVersion: v1
kind: Secret
metadata:
  name: hf-token
  namespace: inference
type: Opaque
data:
  token: BASE64_ENCODED_TOKEN_HERE    # echo -n "hf_YOUR_TOKEN" | base64

User's avatar

Continue reading this post for free, courtesy of Sharon Sahadevan.

Or purchase a paid subscription.
© 2026 Sharon Sahadevan · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture