Comprehensive Kubernetes Tutorial for Site Reliability Engineering

Posted on August 27, 2025August 29, 2025 | by priteshgeek

Introduction & Overview

What is Kubernetes?

Kubernetes, often abbreviated as K8s, is an open-source platform for automating the deployment, scaling, and management of containerized applications. It orchestrates containers across a cluster of machines, ensuring high availability, scalability, and efficient resource utilization.

Core Purpose: Manages containerized workloads and services, abstracting infrastructure complexities.
Key Features: Automated scaling, self-healing, load balancing, and service discovery.
Open-Source: Maintained by the Cloud Native Computing Foundation (CNCF).

History or Background

Kubernetes was originally developed by Google, inspired by its internal Borg system, which managed containerized workloads at scale. It was open-sourced in 2014 and donated to the CNCF in 2015.

Timeline:
- 2014: Google releases Kubernetes as an open-source project.
- 2015: CNCF is formed, with Kubernetes as its first project.
- 2020s: Kubernetes becomes the de facto standard for container orchestration, adopted by major cloud providers (AWS, Azure, GCP).
Evolution: From managing simple stateless apps to complex stateful applications and machine learning workloads.

Why is Kubernetes Relevant in Site Reliability Engineering?

Site Reliability Engineering (SRE) focuses on ensuring systems are reliable, scalable, and efficient. Kubernetes aligns with SRE principles by:

Reliability: Self-healing mechanisms like auto-restarting failed containers or rescheduling pods.
Scalability: Horizontal pod autoscaling and cluster autoscaling to handle traffic spikes.
Observability: Integrates with monitoring tools like Prometheus and Grafana for metrics and logging.
Automation: Reduces toil through automated deployments, rollbacks, and resource management.

Kubernetes enables SREs to maintain service-level objectives (SLOs) and service-level indicators (SLIs) by providing robust tools for managing distributed systems.

Core Concepts & Terminology

Key Terms and Definitions

Term	Definition
Pod	Smallest deployable unit in Kubernetes, containing one or more containers.
Node	A single machine (physical or virtual) in the Kubernetes cluster.
Cluster	A set of nodes that run containerized applications managed by Kubernetes.
Deployment	Manages stateless applications, ensuring desired pod replicas are running.
Service	Abstracts a set of pods and provides stable networking (e.g., load balancing).
ConfigMap	Stores configuration data as key-value pairs for applications.
Secret	Stores sensitive data, such as passwords or API keys, securely.
Namespace	Logical partitioning of resources within a cluster for isolation.
Ingress	Manages external access to services, typically via HTTP/HTTPS.
Kubelet	Agent running on each node, communicating with the control plane.
Kube-apiserver	The primary management component exposing the Kubernetes API.

How Kubernetes Fits into the SRE Lifecycle

Kubernetes supports the SRE lifecycle (design, deploy, monitor, maintain) in the following ways:

Design: Declarative configuration (YAML/JSON) for infrastructure-as-code.
Deploy: Rolling updates and canary deployments minimize downtime.
Monitor: Integrates with observability tools to track SLIs (e.g., latency, error rates).
Maintain: Automates scaling, self-healing, and resource optimization, reducing manual intervention.

Architecture & How It Works

Components

Kubernetes follows a master-worker architecture with a control plane and worker nodes.

Control Plane Components:
- Kube-apiserver: Handles API requests and serves as the cluster’s front end.
- etcd: Distributed key-value store for cluster state and configuration.
- Kube-scheduler: Assigns pods to nodes based on resource requirements and constraints.
- Kube-controller-manager: Runs controllers (e.g., ReplicaSet, Node Controller) to maintain desired state.
- Cloud-controller-manager: Integrates with cloud provider APIs (e.g., AWS, GCP).
Node Components:
- Kubelet: Ensures containers in pods are running as expected.
- Kube-proxy: Manages networking rules for pod communication.
- Container Runtime: Software (e.g., containerd, CRI-O) that runs containers.

Internal Workflow

User Interaction: Users submit YAML manifests to the kube-apiserver via kubectl.
State Management: The desired state is stored in etcd.
Scheduling: Kube-scheduler assigns pods to nodes based on resource availability and policies.
Reconciliation: Controllers monitor the cluster and reconcile actual state with desired state.
Networking: Kube-proxy and services handle load balancing and pod-to-pod communication.

Architecture Diagram (Text Description)

The Kubernetes architecture consists of a control plane and worker nodes:

Control Plane: A central management layer with kube-apiserver, etcd, kube-scheduler, and kube-controller-manager. These components communicate to maintain cluster state.
Worker Nodes: Each node contains kubelet, kube-proxy, and a container runtime. Pods (groups of containers) run on nodes.
Networking: Services connect pods internally, and Ingress manages external traffic. A CNI (Container Network Interface) plugin (e.g., Calico, Flannel) enables pod-to-pod communication.
Visualization: Imagine a central control plane (like a brain) coordinating multiple worker nodes (like muscles) via API calls, with etcd as the memory storing the cluster’s state.

                +----------------------+
                |   kubectl / CI/CD    |
                +----------+-----------+
                           |
                +----------v-----------+
                |    API Server        |
                +----+---------+-------+
                     |         |
        +------------+---+ +---+-------------+
        |   etcd        | |   Scheduler      |
        +---------------+ +------------------+
                           |
                +----------v-----------+
                |  Controller Manager  |
                +----------+-----------+
                           |
        +------------------v------------------+
        |             Worker Nodes            |
        |                                     |
   +----v-----+   +----v-----+   +----v-----+ |
   | Kubelet  |   | Kubelet  |   | Kubelet  | |
   | KubeProxy|   | KubeProxy|   | KubeProxy| |
   | Runtime  |   | Runtime  |   | Runtime  | |
   +----------+   +----------+   +----------+ |
        |             |              |        |
      +---+         +---+          +---+      |
      |Pod|         |Pod|          |Pod|      |
      +---+         +---+          +---+      |

Integration Points with CI/CD or Cloud Tools

CI/CD: Kubernetes integrates with tools like Jenkins, GitLab CI, or ArgoCD for automated deployments.
- Example: ArgoCD uses GitOps to sync Kubernetes manifests from a Git repository.
Cloud Tools: Managed Kubernetes services (e.g., AWS EKS, Azure AKS, Google GKE) handle control plane management.
Monitoring: Prometheus for metrics, Grafana for visualization, and Fluentd for logging.
Storage: Integrates with cloud storage (e.g., AWS EBS, GCP Persistent Disk) via Persistent Volumes (PVs).

Installation & Getting Started

Basic Setup or Prerequisites

Hardware: A machine (local or cloud) with at least 2 CPUs, 4GB RAM, and 20GB storage.
Software:
- Docker or containerd for container runtime.
- kubectl: Command-line tool for interacting with Kubernetes.
- Minikube or kind for local clusters; alternatively, a cloud provider (AWS, GCP, Azure).
OS: Linux, macOS, or Windows with a compatible container runtime.
Networking: Ensure ports 6443 (API server) and 10250 (kubelet) are open.

Hands-on: Step-by-Step Beginner-Friendly Setup Guide

This guide sets up a local Kubernetes cluster using Minikube.

Install Minikube:
- On Linux:

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

On macOS:

brew install minikube

On Windows: Use Chocolatey or download the executable from Minikube’s website.

2. Install kubectl:

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/

3. Start Minikube:

minikube start --driver=docker

This creates a single-node cluster with Docker as the driver.

4. Verify Cluster:

kubectl get nodes

Output should show a node with STATUS: Ready.

5. Deploy a Sample Application:
Create a file nginx-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

Apply the deployment:

kubectl apply -f nginx-deployment.yaml

6. Expose the Application:

kubectl expose deployment nginx-deployment --type=NodePort --port=80

7. Access the Application:

minikube service nginx-deployment

This opens a browser with the Nginx welcome page.

Real-World Use Cases

Scenario 1: Microservices Deployment

Context: A fintech company deploys a microservices-based payment platform.
Application: Kubernetes manages multiple services (e.g., authentication, payment processing, notifications) as separate pods.
SRE Benefit: Horizontal scaling ensures low latency during peak transaction times, and rolling updates enable zero-downtime deployments.
Example: PayPal uses Kubernetes to handle millions of transactions with high availability.

Scenario 2: Disaster Recovery

Context: An e-commerce platform needs to ensure uptime during outages.
Application: Kubernetes’ self-healing restarts failed pods, and multi-region clusters (e.g., via federation) ensure failover.
SRE Benefit: Maintains SLOs for availability (e.g., 99.99% uptime).
Example: Shopify uses Kubernetes to manage traffic spikes during Black Friday sales.

Scenario 3: Machine Learning Workflows

Context: A healthcare company runs ML models for diagnostics.
Application: Kubernetes orchestrates ML pipelines (e.g., Kubeflow) for training and inference.
SRE Benefit: Resource isolation and GPU scheduling optimize compute-intensive tasks.
Example: NVIDIA uses Kubernetes for ML workloads in healthcare imaging.

Scenario 4: CI/CD Automation

Context: A SaaS provider automates application deployments.
Application: Kubernetes integrates with ArgoCD for GitOps-based continuous deployment.
SRE Benefit: Reduces toil by automating rollouts and rollbacks.
Example: GitLab uses Kubernetes for its CI/CD pipelines.

Benefits & Limitations

Key Advantages

Advantage	Description
Scalability	Automatically scales pods and nodes based on demand.
Portability	Runs consistently across on-premises, hybrid, and cloud environments.
Self-Healing	Restarts failed containers, reschedules pods, and replaces unhealthy nodes.
Ecosystem	Rich ecosystem with tools like Helm, Prometheus, and Istio.

Common Challenges or Limitations

Challenge	Description
Complexity	Steep learning curve for managing clusters, networking, and storage.
Resource Overhead	Control plane and networking components consume significant resources.
Security	Misconfigurations (e.g., exposed API servers) can lead to vulnerabilities.
Stateful Applications	Managing stateful workloads (e.g., databases) requires additional tools.

Best Practices & Recommendations

Security Tips

RBAC: Use Role-Based Access Control to limit permissions.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]

Network Policies: Restrict pod-to-pod communication.
Secrets Management: Use tools like HashiCorp Vault or AWS Secrets Manager.
Image Scanning: Scan container images for vulnerabilities using tools like Trivy.

Performance

Use resource limits and requests to prevent resource contention:

spec:
  containers:
  - name: app
    resources:
      limits:
        cpu: "500m"
        memory: "512Mi"
      requests:
        cpu: "200m"
        memory: "256Mi"

Enable Horizontal Pod Autoscaler (HPA) for dynamic scaling.

Maintenance

Regularly update Kubernetes to the latest stable version.
Use tools like Kubeadm for cluster upgrades.
Monitor cluster health with Prometheus and Grafana.

Compliance Alignment

Align with standards like GDPR or HIPAA by encrypting data at rest (etcd) and in transit (TLS).
Use audit logging to track API server access.

Automation Ideas

Implement GitOps with ArgoCD or Flux for automated deployments.
Use Helm charts to package and manage applications.

Comparison with Alternatives

Feature/Tool	Kubernetes	Docker Swarm	Nomad
Orchestration	Advanced (pods, services, etc.)	Basic (services, tasks)	Flexible (jobs, tasks)
Scalability	Highly scalable	Limited scalability	Moderately scalable
Ecosystem	Rich (Helm, Istio, etc.)	Limited ecosystem	Moderate ecosystem
Learning Curve	Steep	Moderate	Moderate
Use Case	Complex, large-scale apps	Simple container apps	Mixed workloads

When to Choose Kubernetes

Choose Kubernetes: For large-scale, cloud-native applications requiring high availability, complex networking, and a rich ecosystem.
Choose Alternatives: Docker Swarm for simpler setups or Nomad for mixed workloads (containers, VMs).

Conclusion

Kubernetes is a powerful platform for SREs, enabling reliable, scalable, and automated management of containerized applications. Its robust ecosystem and integration capabilities make it ideal for modern cloud-native environments. However, its complexity requires careful planning and expertise.

Future Trends

Serverless Kubernetes: Tools like Knative enable serverless workloads.
AI/ML Workloads: Increased adoption for ML pipelines (e.g., Kubeflow).
Edge Computing: Kubernetes for edge devices with projects like KubeEdge.

Next Steps

Explore hands-on labs on platforms like Katacoda or Kubernetes Playground.
Join communities: CNCF Slack, Kubernetes Reddit, or Stack Overflow.
Official Docs: Kubernetes Documentation