Kubernetes Architecture Simplified
Introduction
Kubernetes is a container orchestrator quickly becoming the industry standard for managing large-scale containerized applications. At its core, Kubernetes is built on a distributed architecture that enables it to run applications consistently and reliably across large clusters of machines.
However, the sheer complexity of Kubernetes can be intimidating for newcomers, especially those unfamiliar with the underlying concepts and technologies. This article will provide a simplified overview of Kubernetes architecture and explain the key components that make it a powerful tool for modern application deployment and management.
Kubernetes architecture is designed to provide a scalable and resilient platform for deploying, managing, and orchestrating containerized applications. At a high level, Kubernetes consists of a control plane and worker nodes.
The Kubernetes control plane is responsible for managing the state of the cluster and scheduling workloads on the worker nodes. In contrast, the worker nodes are responsible for running the actual application workloads. In addition, Kubernetes also provides various services and tools to manage and monitor the cluster's health and applications. Kubernetes architecture is designed to provide a highly available, fault-tolerant platform for running large-scale containerized applications.
Nodes
Nodes are the worker machines that run containerized applications in a Kubernetes cluster. They are the primary building blocks of a Kubernetes cluster and are responsible for running the containerized applications.
A node is a physical or virtual server capable of running the Kubernetes components and workloads. There are two types of nodes in the Kubernetes cluster: control plane nodes and worker nodes.
Control plane Nodes
Control plane nodes are the machines responsible for managing the Kubernetes control plane components such as,
Kube API server
Kube Scheduler
Kube Controller manager
Cloud Controller manager
ETCD
These components are essential for maintaining the state of the cluster and are responsible for scheduling the workload to the worker nodes.
Control plane nodes are typically not used to run containerized applications and are dedicated to running Kubernetes control plane components.
Kube API server
The Kubernetes API server is a core component of the Kubernetes control plane. It exposes the Kubernetes API used by other control plane components and users to interact with the Kubernetes cluster.
In addition, the API server is responsible for,
validating and processing API requests,
storing data about the state of the cluster in etcd, and
propagating updates to the other components of the control plane.
The Kubernetes API is a RESTful API that allows users to manage Kubernetes resources such as Pods, Services, Deployments, and more. It provides a standardized way to interact with the Kubernetes cluster, regardless of the underlying infrastructure or implementation details.
The API is versioned, and each version defines a set of resources and operations that can be performed on those resources.
The API server receives API requests from clients such as kubectl, the Kubernetes Dashboard, or other control plane components. When a request is received, the API server first validates it to ensure that it conforms to the Kubernetes API schema and that the user has all the necessary permissions to perform the requested operation.
If the request is valid, the API server then processes the request by updating the state of the cluster in etcd.
The API server uses etcd to store data about the state of the cluster, including information about all the Kubernetes resources and their current state. etcd is a distributed key-value store.
etcd provides a highly available, fault-tolerant storage layer for Kubernetes. The API server communicates with etcd through the etcd client API to store and retrieve data about the state of the cluster.
Summary
Kubernetes API server: core component of the control plane
Exposes Kubernetes API: used for interaction with the cluster
Validates and processes API requests
Stores cluster state data in etcd: a distributed key-value store
Propagates updates to other control plane components (e.g., Scheduler, Controller Manager)
Supports standardized, versioned RESTful API for resource management (e.g., Pods, Services, Deployments)
Kube Scheduler
The kube-scheduler is responsible for assigning newly created pods to suitable nodes in the cluster. It plays a crucial role in ensuring optimal resource utilization and maintaining the cluster's desired state.
When a pod is created, it enters the "Pending" state, meaning it has not yet been assigned to a node. The kube-scheduler takes these pending pods and assigns them to nodes based on various factors, such as,
Resource requirements: CPU, memory, etc.
Node affinity/anti-affinity: user-defined constraints for pod placement
Taints and tolerations: restricts certain pods from running on specific nodes
Pod affinity/anti-affinity: pod placement relative to other pods
Custom scheduler policies or plugins
Node availability and capacity
Data locality: placing pods near data sources (e.g., in a distributed storage system)
The scheduler's primary goal is to ensure that pods are placed on nodes where they can efficiently run and meet their requirements without overloading the nodes.
The kube-scheduler uses a two-step process for pod placement:
Filtering: In this step, the scheduler filters out nodes that do not meet the requirements of the pod, such as insufficient resources or conflicts with existing pods. The remaining nodes are considered feasible for pod placement.
Scoring: After filtering, the scheduler assigns a score to each feasible node based on multiple criteria, including resource utilization, affinity rules, and custom priority functions. The node with the highest score is chosen for pod placement.
Once the scheduler has selected the most suitable node for the Pod, it updates the Pod's information in the API server with the node assignment. The kubelet running on the assigned node will then detect the new Pod and start the containers within the Pod.
Kube Controller Manager
The kube-controller-manager is a critical component of the Kubernetes control plane responsible for managing various controllers that ensure the cluster's desired state is maintained. It runs multiple controller processes within a single binary, streamlining the management and operation of the control plane.
Some of the core controllers managed by the kube-controller-manager include:
Node controller: Monitors the health of nodes in the cluster and takes appropriate actions, such as marking a node as "NotReady" or evicting pods from a failed node.
Replication controller: Ensures the correct number of replicas for a ReplicationController resource is always running.
Endpoint controller: Populates the "Endpoints" object for services, providing a way to route traffic to pods.
Service Account and Token controllers: Manage default service accounts and API access tokens for namespaces.
Deployment controller: Ensures that the desired number of replicas for a Deployment resource is maintained, and handles rolling updates or rollbacks as needed.
DaemonSet controller: Ensures that the desired number of replicas for a DaemonSet resource is running on all eligible nodes.
Job controller: Manages the lifecycle of Job resources, which represent one-off tasks that run to completion.
The kube-controller-manager watches the Kubernetes API server for resource changes and continuously works to reconcile the observed state with the desired state specified in the resource definitions. Each controller runs in its dedicated reconciliation loop, detecting changes and taking appropriate actions to maintain the desired state.
Cloud Controller Manager
The Cloud Controller Manager (CCM) is a component in Kubernetes that is responsible for managing the interactions between the cluster and the underlying cloud infrastructure. CCM aims to provide better separation of concerns, allowing cloud-specific features to be developed and maintained independently from the core Kubernetes components. This separation enables cloud providers to integrate their services more effectively and eases the development of cloud-agnostic Kubernetes features.
The Cloud Controller Manager runs controllers specific to a particular cloud provider. Some of these controllers include:
Node controller: Ensures that nodes in the cluster have corresponding cloud provider resources, sets provider-specific metadata for nodes and deletes orphaned cloud provider resources when nodes are deleted from the cluster.
Route controller: Configures routes in the cloud provider's infrastructure for proper pod networking across nodes.
Service controller: Manages cloud provider load balancers and ensures that services of type LoadBalancer are correctly configured with the cloud provider's resources.
Volume controller: Manages cloud provider storage resources and ensures that Persistent Volumes are correctly provisioned, attached, and detached.
The Cloud Controller Manager is an optional component, and its use depends on the cloud provider's implementation. For example, some cloud providers have specific cloud-controller-manager binaries that need to be deployed separately, while others integrate their controllers directly into the Kubernetes control plane.
ETCD
etcd is a distributed, consistent key-value store used as the primary datastore for Kubernetes to store and manage its configuration data, state information, and metadata. It is a crucial component of the Kubernetes control plane, providing a reliable and fault-tolerant source of truth for the state of the entire cluster.
etcd was developed by CoreOS and is now a Cloud Native Computing Foundation (CNCF) project. Etcd is designed to be simple, secure, and easily scalable, focusing on providing a distributed system resilient to failures.
Some key features of etcd include:
Consistency: etcd ensures that all reads and writes are consistent across the cluster using the Raft consensus algorithm, providing strong guarantees that every node has the same data view.
High availability: etcd is designed to be deployed as a cluster, providing fault tolerance and enabling the system to continue functioning even if some nodes fail.
Watch functionality: etcd allows clients to watch for changes to specific keys or directories, enabling real-time notifications for updates.
Transactions: etcd supports multi-key transactions with conditional updates, allowing clients to perform complex atomic operations.
Access control: etcd provides role-based access control (RBAC) to secure access to the key-value store.
In a Kubernetes cluster, components such as the API server, controller manager, and scheduler interact with etcd to store and retrieve data, including the configuration of resources like pods, services, and deployments and the current state of the system.
Worker Nodes
On the other hand, Worker nodes are the machines where the containerized applications are deployed and run. These nodes run the pods, the smallest deployable units in Kubernetes.
Worker nodes are where the majority of the application workload is executed. Each worker node runs a container runtime, responsible for running the containers, and a kubelet, accountable for managing the pods.
Kubelet
The kubelet runs on each node and is responsible for ensuring that the containers in the Pod are running and healthy. It communicates with the API server to receive instructions on which containers to run and how to run them.
The container runtime (i.e.: Docker) is responsible for pulling the container images from the registry, creating the container, and managing the container lifecycle.
Kube Proxy
Kube-proxy is a key component of the Kubernetes networking model. It runs on each node in a Kubernetes cluster and is responsible for maintaining network rules and enabling service abstraction.
Kube-proxy ensures that the services defined in the cluster are accessible and properly load-balanced, both from within the cluster and external clients.
Kube-proxy has three operating modes:
Userspace mode (deprecated): In this mode, kube-proxy listens on a unique port for each Service and forwards incoming traffic to one of the Service's backend pods using a round-robin algorithm. This mode is less efficient due to the additional overhead of proxying through userspace, which has been deprecated in favor of more efficient modes.
iptables mode: In this mode, kube-proxy uses Linux iptables to manage network rules. When a service is created or updated, kube-proxy updates the iptables rules on each node to ensure that traffic is properly routed to the backend pods. This mode is more efficient than the userspace mode because it operates within the kernel space, reducing overhead.
IPVS mode: IP Virtual Server (IPVS) mode is another kernel-space mode that uses the Linux kernel's IPVS feature for load balancing. It offers better performance and scalability than iptables mode, especially in large clusters. IPVS mode supports multiple load-balancing algorithms, such as round-robin, least connections, and shortest expected delay.
Addons
Kubernetes add-ons are extensions or plugins that provide additional functionality to a Kubernetes cluster. These add-ons help with various tasks like monitoring, logging, networking, and authentication.