{"id":1961,"date":"2026-02-15T11:17:45","date_gmt":"2026-02-15T11:17:45","guid":{"rendered":"https:\/\/sreschool.com\/blog\/containerd\/"},"modified":"2026-02-15T11:17:45","modified_gmt":"2026-02-15T11:17:45","slug":"containerd","status":"publish","type":"post","link":"https:\/\/sreschool.com\/blog\/containerd\/","title":{"rendered":"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)"},"content":{"rendered":"\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Quick Definition (30\u201360 words)<\/h2>\n\n\n\n<p>containerd is a lightweight container runtime that manages container lifecycle, images, storage, and networking on a host. Analogy: containerd is the ship&#8217;s engine room powering container execution while orchestration is the captain. Formal: It is an industry-standard daemon implementing OCI runtime and image specs for running containers.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">What is containerd?<\/h2>\n\n\n\n<p>containerd is an open-source container runtime daemon originally spun out of Docker. It provides core primitives to pull, store, and run container images, manage snapshots and storage, and interface with OCI-compliant runtimes like runc or runsc. It is NOT a container orchestrator, not a full developer tooling suite, and not responsible for cluster-level scheduling.<\/p>\n\n\n\n<p>Key properties and constraints:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Focused on host-level container lifecycle management.<\/li>\n<li>Implements client-server gRPC API for extensibility and automation.<\/li>\n<li>Integrates with OCI runtimes for low-level process isolation.<\/li>\n<li>Designed for embedded use inside higher-level systems like Kubernetes.<\/li>\n<li>Minimal opinionated orchestration features; assumes an external controller.<\/li>\n<li>Security surface is limited to host privileges and plugin interfaces.<\/li>\n<\/ul>\n\n\n\n<p>Where it fits in modern cloud\/SRE workflows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Underlying runtime for Kubernetes kubelet (CRI integration).<\/li>\n<li>Embedded in PaaS and serverless platforms to run short-lived containers.<\/li>\n<li>Used in CI runners to spawn build\/test containers efficiently.<\/li>\n<li>Automation and observability tools interact via gRPC, metrics, and logs.<\/li>\n<li>Security tooling enforces policies at image and runtime layers.<\/li>\n<\/ul>\n\n\n\n<p>Diagram description (text-only):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Host Node<\/li>\n<li>systemd (supervises)<\/li>\n<li>containerd daemon<ul>\n<li>image service (pull, store)<\/li>\n<li>snapshotter (overlayfs, zfs, btrfs)<\/li>\n<li>content store<\/li>\n<li>container service (create, delete)<\/li>\n<li>task service (start, stop)<\/li>\n<li>runtime shim processes per container<\/li>\n<\/ul>\n<\/li>\n<li>OCI runtime (runc or alternate)<\/li>\n<li>Network stack and CNI plugins<\/li>\n<li>Orchestrator (Kubernetes) talks via CRI to kubelet which calls containerd<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">containerd in one sentence<\/h3>\n\n\n\n<p>containerd is a production-grade container runtime daemon that manages image lifecycle, storage, and process execution on a host while providing a stable API for orchestration and automation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">containerd vs related terms (TABLE REQUIRED)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Term<\/th>\n<th>How it differs from containerd<\/th>\n<th>Common confusion<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>T1<\/td>\n<td>Docker Engine<\/td>\n<td>Includes dev UX + CLI and higher-level features<\/td>\n<td>People call docker but mean containerd<\/td>\n<\/tr>\n<tr>\n<td>T2<\/td>\n<td>runc<\/td>\n<td>Low-level OCI runtime that executes a container process<\/td>\n<td>runc is invoked by containerd<\/td>\n<\/tr>\n<tr>\n<td>T3<\/td>\n<td>CRI<\/td>\n<td>API spec used by kubelet to talk to runtimes<\/td>\n<td>CRI is protocol not implementation<\/td>\n<\/tr>\n<tr>\n<td>T4<\/td>\n<td>Kubernetes<\/td>\n<td>Orchestrator that schedules containers at cluster level<\/td>\n<td>Kubernetes does not execute containers directly<\/td>\n<\/tr>\n<tr>\n<td>T5<\/td>\n<td>Containerd-shim<\/td>\n<td>Per-container helper process used by containerd<\/td>\n<td>Shim is part of containerd runtime path<\/td>\n<\/tr>\n<tr>\n<td>T6<\/td>\n<td>Podman<\/td>\n<td>Alternative container engine focused on daemonless UX<\/td>\n<td>Podman is not a runtime daemon by default<\/td>\n<\/tr>\n<tr>\n<td>T7<\/td>\n<td>OCI image spec<\/td>\n<td>Image format spec that containerd implements<\/td>\n<td>Spec is format not runtime<\/td>\n<\/tr>\n<tr>\n<td>T8<\/td>\n<td>containerd plugin<\/td>\n<td>Extends containerd features via gRPC plugins<\/td>\n<td>Plugins are optional components<\/td>\n<\/tr>\n<tr>\n<td>T9<\/td>\n<td>runsc<\/td>\n<td>gVisor OCI runtime for sandboxing<\/td>\n<td>runsc provides additional isolation<\/td>\n<\/tr>\n<tr>\n<td>T10<\/td>\n<td>cri-o<\/td>\n<td>CRI implementation alternative to containerd<\/td>\n<td>Used in some Kubernetes distros<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if any cell says \u201cSee details below\u201d)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>None<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Why does containerd matter?<\/h2>\n\n\n\n<p>Business impact:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Revenue: Reliable container execution reduces downtime, preventing revenue loss from failed releases.<\/li>\n<li>Trust: Consistent host behavior lowers customer-facing incidents.<\/li>\n<li>Risk: Smaller attack surface reduces compliance and breach risk when configured properly.<\/li>\n<\/ul>\n\n\n\n<p>Engineering impact:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Incident reduction: Stable, well-instrumented runtime reduces noisy failures and unknown host-level crashes.<\/li>\n<li>Velocity: Fast image pull, layering, and snapshotting speed CI\/CD pipelines and iterative development.<\/li>\n<li>Cost: Efficient image caching and snapshotters reduce compute and I\/O costs.<\/li>\n<\/ul>\n\n\n\n<p>SRE framing:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SLIs\/SLOs: Uptime of container runtime, container start latency, image pull success rate.<\/li>\n<li>Error budgets: Use runtime-level SLOs to limit releases that may increase runtime failures.<\/li>\n<li>Toil: Automate patching and configuration; encapsulate common operations in runbooks.<\/li>\n<li>On-call: Clear escalation paths when containerd node-level issues surface.<\/li>\n<\/ul>\n\n\n\n<p>Realistic \u201cwhat breaks in production\u201d examples:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Node full of phantom images causing OOM and kubelet evictions.<\/li>\n<li>Snapshotter failure on a host blocking container start for pods on that node.<\/li>\n<li>Containerd upgrade causing shim incompatibility and mass container restarts.<\/li>\n<li>High image pull failure rates during a deploy flooding control plane.<\/li>\n<li>Silent leaking of no-longer-needed shims causing PID exhaustion.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Where is containerd used? (TABLE REQUIRED)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Layer\/Area<\/th>\n<th>How containerd appears<\/th>\n<th>Typical telemetry<\/th>\n<th>Common tools<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>L1<\/td>\n<td>Node runtime<\/td>\n<td>daemon running on host to manage containers<\/td>\n<td>process metrics memory cpu restarts<\/td>\n<td>systemd journal prometheus<\/td>\n<\/tr>\n<tr>\n<td>L2<\/td>\n<td>Kubernetes<\/td>\n<td>kubelet uses CRI to call containerd<\/td>\n<td>container start latency image pull success<\/td>\n<td>kubectl kube-proxy cni<\/td>\n<\/tr>\n<tr>\n<td>L3<\/td>\n<td>CI\/CD runners<\/td>\n<td>spawn ephemeral containers for jobs<\/td>\n<td>job runtime timeouts image cache hits<\/td>\n<td>GitLab runner Jenkins runner<\/td>\n<\/tr>\n<tr>\n<td>L4<\/td>\n<td>Serverless<\/td>\n<td>short-lived container sandboxes for functions<\/td>\n<td>cold start time concurrency failures<\/td>\n<td>FaaS platform controller<\/td>\n<\/tr>\n<tr>\n<td>L5<\/td>\n<td>Edge devices<\/td>\n<td>lightweight runtime on constrained hardware<\/td>\n<td>disk pressure I\/O latency<\/td>\n<td>device managers custom agents<\/td>\n<\/tr>\n<tr>\n<td>L6<\/td>\n<td>PaaS layers<\/td>\n<td>managed container pools backing apps<\/td>\n<td>scaling events container churn<\/td>\n<td>platform controllers autoscaler<\/td>\n<\/tr>\n<tr>\n<td>L7<\/td>\n<td>Security tooling<\/td>\n<td>image scanning and runtime policy enforcement<\/td>\n<td>policy violations exploit detections<\/td>\n<td>scanners and seccomp tools<\/td>\n<\/tr>\n<tr>\n<td>L8<\/td>\n<td>Observability<\/td>\n<td>exports metrics and logs for host telemetry<\/td>\n<td>errors restarts open file counts<\/td>\n<td>Prometheus Grafana Fluentd<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>None<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">When should you use containerd?<\/h2>\n\n\n\n<p>When it\u2019s necessary:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You run Kubernetes or another orchestrator that requires a CRI runtime.<\/li>\n<li>You need a lightweight, production-grade runtime with stable API.<\/li>\n<li>You require pluggable snapshotters or alternative OCI runtimes.<\/li>\n<\/ul>\n\n\n\n<p>When it\u2019s optional:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Small dev machines where Podman or Docker Desktop suffice.<\/li>\n<li>Simple single-host apps without orchestrator needs.<\/li>\n<\/ul>\n\n\n\n<p>When NOT to use \/ overuse it:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you want daemonless workflows on dev machines and avoid still-running background processes.<\/li>\n<li>If your platform enforces a different runtime and you cannot change it.<\/li>\n<\/ul>\n\n\n\n<p>Decision checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you need CRI compatibility and cluster orchestration -&gt; Use containerd.<\/li>\n<li>If you need daemonless development on local laptop -&gt; Consider Podman.<\/li>\n<li>If you need sandboxed isolation with kernel mediation -&gt; Use containerd with gVisor runsc.<\/li>\n<li>If you need embedded runtime in a custom platform -&gt; containerd is a strong choice.<\/li>\n<\/ul>\n\n\n\n<p>Maturity ladder:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Beginner: Use packaged containerd from distro or cloud node image.<\/li>\n<li>Intermediate: Configure snapshotters, enable Prometheus metrics, integrate with CI.<\/li>\n<li>Advanced: Custom plugins, multiple runtimes, runtime sandboxing, automated upgrades.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">How does containerd work?<\/h2>\n\n\n\n<p>Components and workflow:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>containerd daemon: central process exposing gRPC API.<\/li>\n<li>Content store: stores raw blobs and manifests.<\/li>\n<li>Image service: pulls, pushes, and manages image metadata.<\/li>\n<li>Snapshotter: creates filesystem snapshots for containers using overlayfs, zfs etc.<\/li>\n<li>Container service: tracks container metadata.<\/li>\n<li>Task service and shim: creates and supervises the running process via shim.<\/li>\n<li>OCI runtime (runc\/runsc): performs the low-level container create\/start.<\/li>\n<\/ul>\n\n\n\n<p>Data flow and lifecycle:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Orchestrator requests image via CRI or client API.<\/li>\n<li>Image service checks content store, initiates pull if missing.<\/li>\n<li>Snapshotter prepares filesystem snapshot from content store.<\/li>\n<li>Containerd creates container metadata.<\/li>\n<li>Task service spawns shim which invokes OCI runtime to start process.<\/li>\n<li>Shim proxies stdio and signals and reports exit to containerd.<\/li>\n<li>On stop, task service tears down processes and snapshotter cleans up.<\/li>\n<\/ol>\n\n\n\n<p>Edge cases and failure modes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Partial image pull leaves corrupt layers.<\/li>\n<li>Snapshotter unavailable due to kernel module issues.<\/li>\n<li>Shim zombies if containerd crashes mid-lifecycle.<\/li>\n<li>Stale mounts preventing filesystem cleanup.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Typical architecture patterns for containerd<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Single-node runtime: containerd as a host daemon for development or edge devices.<\/li>\n<li>Kubernetes worker: containerd integrated via CRI to kubelet.<\/li>\n<li>Multitenant platform: containerd with additional sandbox runtimes and strict seccomp\/AppArmor profiles.<\/li>\n<li>CI runner pool: containerd with aggressive image caching and pre-warmed snapshots.<\/li>\n<li>Serverless sandboxing: containerd + runsc for gVisor isolation for untrusted code.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Failure modes &amp; mitigation (TABLE REQUIRED)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Failure mode<\/th>\n<th>Symptom<\/th>\n<th>Likely cause<\/th>\n<th>Mitigation<\/th>\n<th>Observability signal<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>F1<\/td>\n<td>Image pull failures<\/td>\n<td>Pods Pending with ImagePullBackOff<\/td>\n<td>Registry auth or network issues<\/td>\n<td>Retry, rotate creds fallback registry<\/td>\n<td>pull error rate metric<\/td>\n<\/tr>\n<tr>\n<td>F2<\/td>\n<td>Snapshotter errors<\/td>\n<td>Container start errors with overlay mount<\/td>\n<td>Kernel or filesystem misconfig<\/td>\n<td>Switch snapshotter cleanup restart<\/td>\n<td>failed snapshot ops<\/td>\n<\/tr>\n<tr>\n<td>F3<\/td>\n<td>Shim leaks<\/td>\n<td>High PID count orphaned shims<\/td>\n<td>containerd crash before cleanup<\/td>\n<td>Reclaim shims restart containerd<\/td>\n<td>unexpected PIDs growth<\/td>\n<\/tr>\n<tr>\n<td>F4<\/td>\n<td>Content store corruption<\/td>\n<td>Failed image validation errors<\/td>\n<td>Disk corruption or abrupt shutdown<\/td>\n<td>Reconcile content store rebuild<\/td>\n<td>content integrity errors<\/td>\n<\/tr>\n<tr>\n<td>F5<\/td>\n<td>Resource exhaustion<\/td>\n<td>OOM or slow nodes<\/td>\n<td>Too many containers or leaky processes<\/td>\n<td>Limit containers cgroups quotas<\/td>\n<td>node memory swap metrics<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>None<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Key Concepts, Keywords &amp; Terminology for containerd<\/h2>\n\n\n\n<p>Glossary of 40+ terms (term \u2014 1\u20132 line definition \u2014 why it matters \u2014 common pitfall)<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>containerd \u2014 Host-level daemon managing container lifecycle \u2014 Core runtime service \u2014 Confused with Docker.<\/li>\n<li>OCI runtime \u2014 Low-level executor for containers like runc \u2014 Provides process isolation \u2014 Runtime mismatch causes failures.<\/li>\n<li>runc \u2014 Reference OCI runtime \u2014 Widely used default \u2014 Requires kernel features.<\/li>\n<li>runsc \u2014 gVisor runtime providing sandboxing \u2014 Adds isolation via user-space kernel \u2014 Higher latencies.<\/li>\n<li>CRI \u2014 Container Runtime Interface used by kubelet \u2014 Standard protocol \u2014 Version mismatches break kubelet.<\/li>\n<li>shim \u2014 Lightweight per-container helper \u2014 Keeps stdio alive after containerd restart \u2014 Leaked shims consume PIDs.<\/li>\n<li>snapshotter \u2014 Creates container filesystem layers \u2014 Enables fast container startup \u2014 Incompatible snapshotters can fail.<\/li>\n<li>content store \u2014 Blob storage for images \u2014 Deduplicates image layers \u2014 Corruption impacts many images.<\/li>\n<li>image manifest \u2014 Metadata describing image layers \u2014 Used to assemble FS \u2014 Wrong manifests prevent pulls.<\/li>\n<li>layer \u2014 Filesystem delta in images \u2014 Allows sharing across images \u2014 Large layers increase pull time.<\/li>\n<li>overlayfs \u2014 Common snapshotter backend \u2014 Fast union filesystem \u2014 Kernel support required.<\/li>\n<li>zfs \u2014 Snapshot-capable FS for snapshotter \u2014 Good for performance in some workloads \u2014 Complexity in configuration.<\/li>\n<li>namespace \u2014 containerd isolation for images and containers \u2014 Useful multi-tenant separation \u2014 Misuse causes resource leaks.<\/li>\n<li>plugin \u2014 Extends containerd functionality via gRPC \u2014 Enables custom behavior \u2014 Incompatible plugins risk.<\/li>\n<li>gRPC API \u2014 Programmatic interface to containerd \u2014 Enables automation and observability \u2014 Schema changes require clients update.<\/li>\n<li>tasks \u2014 Running container processes tracked by containerd \u2014 Primary execution concept \u2014 Task lifecycle mismatches cause restarts.<\/li>\n<li>paus e container \u2014 Not applicable \u2014 Not publicly stated \u2014 Not publicly stated<\/li>\n<li>introspection API \u2014 Internal APIs for debugging \u2014 Helpful for triage \u2014 Some endpoints may be disabled.<\/li>\n<li>health check \u2014 Runtime-level liveness and readiness probes \u2014 Guards against deadlocks \u2014 Poor checks cause false restarts.<\/li>\n<li>metrics \u2014 containerd exposes Prometheus metrics \u2014 Essential for SRE monitoring \u2014 Missing metrics hide issues.<\/li>\n<li>image pull policy \u2014 Rules for pulling images \u2014 Affects latency and cache usage \u2014 Aggressive policies increase bandwidth.<\/li>\n<li>garbage collection \u2014 Removes unused images and snapshots \u2014 Prevents disk exhaustion \u2014 Overaggressive GC can remove active layers.<\/li>\n<li>concurrency limits \u2014 Limits on creates and pulls \u2014 Protects node stability \u2014 Too strict limits slow deploys.<\/li>\n<li>container lifecycle \u2014 Create start stop delete sequence \u2014 Fundamental operational model \u2014 Partial states can persist.<\/li>\n<li>content verification \u2014 Validates image integrity \u2014 Prevents tampered images \u2014 Not configured by default sometimes.<\/li>\n<li>seccomp \u2014 Kernel syscall filtering \u2014 Reduces attack surface \u2014 Complex filters may break apps.<\/li>\n<li>AppArmor \u2014 LSM for process isolation \u2014 Enhances security \u2014 Profile misconfiguration blocks behavior.<\/li>\n<li>cgroups \u2014 Resource control for containers \u2014 Enforces CPU and memory limits \u2014 Misconfiguration causes noisy neighbors.<\/li>\n<li>PID namespace \u2014 Process isolation per container \u2014 Affects tooling expectations \u2014 Tools expecting host PID will fail.<\/li>\n<li>rootless \u2014 Running containerd without root \u2014 Improves security \u2014 Not all features supported.<\/li>\n<li>runtime class \u2014 Kubernetes feature to select a runtime \u2014 Enables multiple runtimes \u2014 Unavailable runtime class breaks scheduling.<\/li>\n<li>pre-pulled image \u2014 Image cached on node before use \u2014 Reduces cold start times \u2014 Stale images cause drift.<\/li>\n<li>cold start \u2014 Time to start first instance of a container \u2014 Critical for serverless \u2014 Image size dominant factor.<\/li>\n<li>hotwarm pool \u2014 Pre-warmed snapshots ready to start \u2014 Reduces latency \u2014 Requires resource planning.<\/li>\n<li>ephemeral containers \u2014 Short-lived containers for tasks \u2014 Useful in CI \u2014 May create churn.<\/li>\n<li>image signing \u2014 Verifies publisher identity \u2014 Prevents supply chain attacks \u2014 Key management required.<\/li>\n<li>attestations \u2014 Proofs about images or runtime state \u2014 Supports compliance \u2014 Integration complexity.<\/li>\n<li>multi-arch images \u2014 Images with multiple platform variants \u2014 Supports heterogeneous nodes \u2014 Mis-tagging causes mismatch.<\/li>\n<li>ORAS \u2014 OCI registry artifact spec \u2014 Broader artifact support \u2014 Different toolchains required.<\/li>\n<li>containerd upgrade \u2014 Replacing containerd binary with newer version \u2014 Must consider shim compatibility \u2014 Upgrade window can cause restarts.<\/li>\n<li>runtime security agent \u2014 Software to enforce runtime policies \u2014 Monitors syscalls and process behavior \u2014 Can add latency.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">How to Measure containerd (Metrics, SLIs, SLOs) (TABLE REQUIRED)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Metric\/SLI<\/th>\n<th>What it tells you<\/th>\n<th>How to measure<\/th>\n<th>Starting target<\/th>\n<th>Gotchas<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>M1<\/td>\n<td>Runtime uptime<\/td>\n<td>daemon availability<\/td>\n<td>scrape process_up metric<\/td>\n<td>99.9% monthly<\/td>\n<td>false negatives during maintenance<\/td>\n<\/tr>\n<tr>\n<td>M2<\/td>\n<td>Container start latency<\/td>\n<td>Time to start container task<\/td>\n<td>measure from create to running<\/td>\n<td>p95 &lt; 2s for small images<\/td>\n<td>large images skew percentiles<\/td>\n<\/tr>\n<tr>\n<td>M3<\/td>\n<td>Image pull success rate<\/td>\n<td>Fraction of successful pulls<\/td>\n<td>success pulls divided by attempts<\/td>\n<td>99.9%<\/td>\n<td>transient registry outages<\/td>\n<\/tr>\n<tr>\n<td>M4<\/td>\n<td>Active shim count<\/td>\n<td>Number of shim processes<\/td>\n<td>process count filtered by name<\/td>\n<td>stable baseline per node<\/td>\n<td>zombie shims inflate numbers<\/td>\n<\/tr>\n<tr>\n<td>M5<\/td>\n<td>Snapshotter ops errors<\/td>\n<td>Failed snapshot operations<\/td>\n<td>error counter per snapshotter<\/td>\n<td>near zero<\/td>\n<td>file system incompatibilities<\/td>\n<\/tr>\n<tr>\n<td>M6<\/td>\n<td>Content store integrity<\/td>\n<td>Corruption incidents<\/td>\n<td>integrity checks or validation jobs<\/td>\n<td>0 incidents<\/td>\n<td>expensive to run frequently<\/td>\n<\/tr>\n<tr>\n<td>M7<\/td>\n<td>Disk usage by images<\/td>\n<td>Disk consumed by images<\/td>\n<td>du on snapshot dir<\/td>\n<td>vary by node size<\/td>\n<td>overlay counting can double count<\/td>\n<\/tr>\n<tr>\n<td>M8<\/td>\n<td>OOM kill count<\/td>\n<td>Out of memory container kills<\/td>\n<td>kernel OOM events per container<\/td>\n<td>0 desired<\/td>\n<td>noisy with flaky apps<\/td>\n<\/tr>\n<tr>\n<td>M9<\/td>\n<td>Image pull bandwidth<\/td>\n<td>Network bytes during pulls<\/td>\n<td>network egress counters<\/td>\n<td>depends on infra<\/td>\n<td>shared links may bias<\/td>\n<\/tr>\n<tr>\n<td>M10<\/td>\n<td>Restart rate<\/td>\n<td>containerd restarts or crashes<\/td>\n<td>service restart counter<\/td>\n<td>0 expected<\/td>\n<td>auto-restarts mask root cause<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>None<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Best tools to measure containerd<\/h3>\n\n\n\n<p>Provide 5\u201310 tools. For each tool use this exact structure (NOT a table):<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 Prometheus<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for containerd: Exposes containerd metrics like pulls ops errors and task counts via exporter.<\/li>\n<li>Best-fit environment: Kubernetes clusters and systems with Prometheus stack.<\/li>\n<li>Setup outline:<\/li>\n<li>Enable containerd Prometheus exporter or metrics endpoint.<\/li>\n<li>Configure scrape job in Prometheus.<\/li>\n<li>Add relabeling to separate node metric namespaces.<\/li>\n<li>Strengths:<\/li>\n<li>Flexible query language and alerting.<\/li>\n<li>Wide ecosystem integrations.<\/li>\n<li>Limitations:<\/li>\n<li>Requires metric instrumentation enabled.<\/li>\n<li>Storage and cardinality management needed.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 Grafana<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for containerd: Visualizes Prometheus metrics, logs, and traces related to containerd.<\/li>\n<li>Best-fit environment: Teams needing dashboards and alerts.<\/li>\n<li>Setup outline:<\/li>\n<li>Connect Grafana to Prometheus datasource.<\/li>\n<li>Import or build dashboards for container start latency and failures.<\/li>\n<li>Configure panel alerts or link to alertmanager.<\/li>\n<li>Strengths:<\/li>\n<li>Rich visualizations and templating.<\/li>\n<li>Limitations:<\/li>\n<li>Alerting better handled by backend alert system.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 Fluentd \/ Fluent Bit<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for containerd: Collects containerd logs and shim logs for aggregation.<\/li>\n<li>Best-fit environment: Centralized logging platform.<\/li>\n<li>Setup outline:<\/li>\n<li>Deploy logging agent on nodes.<\/li>\n<li>Parse containerd journal entries and JSON logs.<\/li>\n<li>Forward to central storage or SIEM.<\/li>\n<li>Strengths:<\/li>\n<li>Lightweight and flexible parsing.<\/li>\n<li>Limitations:<\/li>\n<li>Log volume can be high; parsing complexity.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 eBPF tracing (e.g., BPFTrace tools)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for containerd: Traces syscalls, network, and I\/O for containerd processes and shims.<\/li>\n<li>Best-fit environment: Deep debugging in staging or forensic analysis.<\/li>\n<li>Setup outline:<\/li>\n<li>Install eBPF tooling on host kernel with support.<\/li>\n<li>Attach scripts to containerd and shim events.<\/li>\n<li>Collect traces and analyze latency hotspots.<\/li>\n<li>Strengths:<\/li>\n<li>Low-overhead deep visibility.<\/li>\n<li>Limitations:<\/li>\n<li>Requires kernel support and privileges.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 node-exporter<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for containerd: Host-level metrics including disk and process stats that affect containerd.<\/li>\n<li>Best-fit environment: Node-level baseline telemetry.<\/li>\n<li>Setup outline:<\/li>\n<li>Deploy node-exporter on nodes.<\/li>\n<li>Ensure metrics scrape by Prometheus.<\/li>\n<li>Correlate node metrics with containerd metrics.<\/li>\n<li>Strengths:<\/li>\n<li>Easy to deploy and lightweight.<\/li>\n<li>Limitations:<\/li>\n<li>Not containerd specific; needs correlation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Recommended dashboards &amp; alerts for containerd<\/h3>\n\n\n\n<p>Executive dashboard:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Panels: Cluster-level containerd uptime, image pull success rate, overall container start latency p95, total disk used by images.<\/li>\n<li>Why: High-level health and business impact indicators for leadership.<\/li>\n<\/ul>\n\n\n\n<p>On-call dashboard:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Panels: Node-specific containerd process status, recent containerd restarts, active shim counts, failed snapshot ops, top nodes by image pull errors.<\/li>\n<li>Why: Quick triage for paged incidents.<\/li>\n<\/ul>\n\n\n\n<p>Debug dashboard:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Panels: Per-node image pull latency histograms, snapshotter ops logs, content store integrity errors, recent shim exit codes, per-container resource usage.<\/li>\n<li>Why: Deep troubleshooting and root cause analysis.<\/li>\n<\/ul>\n\n\n\n<p>Alerting guidance:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Page vs ticket: Page for containerd daemon down, snapshotter errors blocking many pods, node-level OOM spree. Ticket for single container pull failures or intermittent pulls that do not affect SLOs.<\/li>\n<li>Burn-rate guidance: If container start latency SLO consumes &gt;50% error budget in 1 hour, escalate to on-call.<\/li>\n<li>Noise reduction tactics: Deduplicate alerts by node group, group by failure type, suppress alerts during planned upgrades windows.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Implementation Guide (Step-by-step)<\/h2>\n\n\n\n<p>1) Prerequisites\n&#8211; Kernel features required (overlayfs, namespaces).\n&#8211; Access to node images or package repo.\n&#8211; Prometheus\/Grafana logging stack planned.\n&#8211; Backup of existing containerd config.<\/p>\n\n\n\n<p>2) Instrumentation plan\n&#8211; Enable containerd Prometheus metrics.\n&#8211; Collect shim and containerd logs.\n&#8211; Integrate node-exporter and eBPF tracing for deep diagnostics.<\/p>\n\n\n\n<p>3) Data collection\n&#8211; Configure scraping intervals for critical metrics.\n&#8211; Retain logs for minimum 30 days for postmortems.\n&#8211; Run periodic content store integrity checks.<\/p>\n\n\n\n<p>4) SLO design\n&#8211; Define SLIs such as container start latency and image pull success.\n&#8211; Set SLOs with realistic error budgets based on business needs.<\/p>\n\n\n\n<p>5) Dashboards\n&#8211; Build executive, on-call, and debug dashboards.\n&#8211; Include historical trends and heatmaps for regressions.<\/p>\n\n\n\n<p>6) Alerts &amp; routing\n&#8211; Implement paging rules for severe runtime failures.\n&#8211; Group noisy alerts and use suppression during maintenance.<\/p>\n\n\n\n<p>7) Runbooks &amp; automation\n&#8211; Create runbooks for common failures: image pull, snapshotter, shim leaks.\n&#8211; Automate cleanup tasks and safe restarts using orchestration tooling.<\/p>\n\n\n\n<p>8) Validation (load\/chaos\/game days)\n&#8211; Simulate heavy pulls, disk pressure, and shim leaks in staging.\n&#8211; Run game days focusing on containerd upgrade and rollback.<\/p>\n\n\n\n<p>9) Continuous improvement\n&#8211; Track incidents, update SLOs, refine dashboards and runbooks.<\/p>\n\n\n\n<p>Checklists:<\/p>\n\n\n\n<p>Pre-production checklist<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Verify kernel snapshotter support.<\/li>\n<li>Confirm metrics and logging endpoints reachable.<\/li>\n<li>Pre-pull critical images to reduce cold start test variability.<\/li>\n<li>Validate health checks for containerd.<\/li>\n<\/ul>\n\n\n\n<p>Production readiness checklist<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define SLOs and alert thresholds.<\/li>\n<li>Automate config deployment and rollback.<\/li>\n<li>Ensure backup for node images and content store.<\/li>\n<li>Confirm runbooks and on-call rotation.<\/li>\n<\/ul>\n\n\n\n<p>Incident checklist specific to containerd<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Check containerd daemon status and recent restarts.<\/li>\n<li>Inspect containerd logs and shim exit codes.<\/li>\n<li>Query image pull metrics and snapshot errors.<\/li>\n<li>If necessary, drain node and restart containerd gracefully.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Use Cases of containerd<\/h2>\n\n\n\n<p>Provide 8\u201312 use cases:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p>Kubernetes worker runtime\n&#8211; Context: Production K8s cluster nodes.\n&#8211; Problem: Need stable runtime for pods.\n&#8211; Why containerd helps: CRI support and performance.\n&#8211; What to measure: start latency, pull success, restarts.\n&#8211; Typical tools: kubelet, Prometheus, Grafana.<\/p>\n<\/li>\n<li>\n<p>CI\/CD runner sandbox\n&#8211; Context: Shared CI runners executing builds.\n&#8211; Problem: Isolation and fast startup.\n&#8211; Why containerd helps: Snapshotters and image caching.\n&#8211; What to measure: job duration, cache hit rate, disk usage.\n&#8211; Typical tools: GitLab Runner, node-exporter, Fluentd.<\/p>\n<\/li>\n<li>\n<p>Serverless function runtime\n&#8211; Context: Platform running many short-lived functions.\n&#8211; Problem: Cold start and scaling cost.\n&#8211; Why containerd helps: fast snapshot restore, multiple runtimes.\n&#8211; What to measure: cold start p95, concurrency failures.\n&#8211; Typical tools: custom controller, Prometheus, runsc.<\/p>\n<\/li>\n<li>\n<p>Edge device deployment\n&#8211; Context: Fleet of devices running containers.\n&#8211; Problem: Resource constraints and reliability.\n&#8211; Why containerd helps: lightweight daemon and rootless options.\n&#8211; What to measure: disk pressure, process restarts, agent heartbeats.\n&#8211; Typical tools: device managers, remote logging.<\/p>\n<\/li>\n<li>\n<p>Secure multi-tenant PaaS\n&#8211; Context: Shared platform hosting customer workloads.\n&#8211; Problem: Isolation and auditability.\n&#8211; Why containerd helps: runtime classes, gVisor integration.\n&#8211; What to measure: policy violations, seccomp hits, runtime errors.\n&#8211; Typical tools: policy engine, SIEM, attestation services.<\/p>\n<\/li>\n<li>\n<p>Hybrid cloud runtime\n&#8211; Context: Nodes across cloud and on-prem.\n&#8211; Problem: Consistent runtime behavior across environments.\n&#8211; Why containerd helps: stable API and plugin architecture.\n&#8211; What to measure: cross-location latency, image pull variance.\n&#8211; Typical tools: registry caching, CDN, Prometheus federation.<\/p>\n<\/li>\n<li>\n<p>High-density microservices\n&#8211; Context: Many small containers per node.\n&#8211; Problem: Efficient storage and fast scheduling.\n&#8211; Why containerd helps: deduplicated content store and overlayfs.\n&#8211; What to measure: image layer reuse, disk I\/O, container churn.\n&#8211; Typical tools: snapshotter tuning, cgroups.<\/p>\n<\/li>\n<li>\n<p>Artifact distribution and compliance\n&#8211; Context: Ensuring signed images are trusted.\n&#8211; Problem: Preventing supply chain attacks.\n&#8211; Why containerd helps: image signing validation hooks.\n&#8211; What to measure: signing verification failures, blocked pulls.\n&#8211; Typical tools: signing systems, policy engines.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Scenario Examples (Realistic, End-to-End)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #1 \u2014 Kubernetes rolling deploy with large images<\/h3>\n\n\n\n<p><strong>Context:<\/strong> Cluster with many services using 400MB images.\n<strong>Goal:<\/strong> Reduce deployment disruption and start latency.\n<strong>Why containerd matters here:<\/strong> Fast local snapshot reuse and image caching reduce node strain.\n<strong>Architecture \/ workflow:<\/strong> Image caching per node, pre-pull pipeline, containerd snapshotter overlayfs.\n<strong>Step-by-step implementation:<\/strong> Pre-pull images on nodes during low traffic; enable metrics; run controlled rollouts.\n<strong>What to measure:<\/strong> image pull success rate, container start latency, disk usage.\n<strong>Tools to use and why:<\/strong> Prometheus for metrics, Grafana for dashboards, CI pre-pull job.\n<strong>Common pitfalls:<\/strong> Stale pre-pulled images cause drift; disk exhaustion.\n<strong>Validation:<\/strong> Simulate rolling deploy in staging and measure p95 start latency.\n<strong>Outcome:<\/strong> Reduced rollout time and fewer failed pods.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #2 \u2014 Serverless platform cold-start optimization<\/h3>\n\n\n\n<p><strong>Context:<\/strong> FaaS platform serving bursty traffic with latency SLAs.\n<strong>Goal:<\/strong> Reduce cold start p95 by 50%.\n<strong>Why containerd matters here:<\/strong> Pre-warmed snapshot pools and fast task spawning.\n<strong>Architecture \/ workflow:<\/strong> containerd with hotwarm snapshot pool and runsc for untrusted code.\n<strong>Step-by-step implementation:<\/strong> Build pool manager that pre-creates snapshots, monitor pool size, scale pool with load.\n<strong>What to measure:<\/strong> cold start p95, pool hit rate, memory usage.\n<strong>Tools to use and why:<\/strong> containerd metrics, custom pool controller, Prometheus.\n<strong>Common pitfalls:<\/strong> Pool size misconfigured causing memory pressure.\n<strong>Validation:<\/strong> Load test with synthetic bursts.\n<strong>Outcome:<\/strong> Cold start p95 reduced and better SLA compliance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #3 \u2014 Incident response: snapshotter failure post-upgrade<\/h3>\n\n\n\n<p><strong>Context:<\/strong> After a maintenance window, nodes fail to create overlays.\n<strong>Goal:<\/strong> Restore node capacity without data loss.\n<strong>Why containerd matters here:<\/strong> Snapshotter is blocking container start.\n<strong>Architecture \/ workflow:<\/strong> containerd snapshotter overlayfs; kubelet shows pods Pending.\n<strong>Step-by-step implementation:<\/strong> Check kernel messages, roll back kernel or snapshotter plugin, drain node, restart containerd.\n<strong>What to measure:<\/strong> snapshotter error rate, pending pods count, node drain time.\n<strong>Tools to use and why:<\/strong> Journalctl for logs, Prometheus alerts triggered, Grafana for trend analysis.\n<strong>Common pitfalls:<\/strong> Immediate force restart may leave mounts; must cleanup stale mounts.\n<strong>Validation:<\/strong> Postmortem and runbook updates.\n<strong>Outcome:<\/strong> Node restored and future upgrades tested in staging.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #4 \u2014 Cost vs performance trade-off for image storage<\/h3>\n\n\n\n<p><strong>Context:<\/strong> Large fleet with high egress cost due to repeated pulls.\n<strong>Goal:<\/strong> Reduce network egress cost while keeping start latency acceptable.\n<strong>Why containerd matters here:<\/strong> Content store caching and registry mirrors reduce pulls.\n<strong>Architecture \/ workflow:<\/strong> Central registry mirror with local cache and proper TTLs.\n<strong>Step-by-step implementation:<\/strong> Deploy registry cache, configure containerd to prefer mirror, measure hit rate.\n<strong>What to measure:<\/strong> egress bytes, image pull time, cache hit ratio.\n<strong>Tools to use and why:<\/strong> Network telemetry, Prometheus, registry metrics.\n<strong>Common pitfalls:<\/strong> Mirror misconfiguration causing stale images.\n<strong>Validation:<\/strong> Compare costs before and after over 30 days.\n<strong>Outcome:<\/strong> Lower egress cost and acceptable start latency.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Common Mistakes, Anti-patterns, and Troubleshooting<\/h2>\n\n\n\n<p>List of mistakes with Symptom -&gt; Root cause -&gt; Fix (15\u201325 items):<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Symptom: Frequent ImagePullBackOff -&gt; Root cause: Misconfigured registry auth -&gt; Fix: Rotate creds and test pull.<\/li>\n<li>Symptom: Slow container start -&gt; Root cause: Large unoptimized image layers -&gt; Fix: Rebuild image with smaller layers.<\/li>\n<li>Symptom: Disk full on \/var\/lib\/containerd -&gt; Root cause: No garbage collection -&gt; Fix: Implement GC policy and cleanup scripts.<\/li>\n<li>Symptom: High PID count of shims -&gt; Root cause: containerd crash left orphaned shims -&gt; Fix: Restart containerd and reclaim shims.<\/li>\n<li>Symptom: Snapshotter mount errors -&gt; Root cause: Kernel overlayfs bug -&gt; Fix: Use alternate snapshotter or kernel update.<\/li>\n<li>Symptom: Intermittent pulls fail during deploy -&gt; Root cause: Registry rate limits -&gt; Fix: Add cache\/mirror and retry backoff.<\/li>\n<li>Symptom: Prometheus missing metrics -&gt; Root cause: metrics endpoint disabled -&gt; Fix: Enable metrics in containerd config.<\/li>\n<li>Symptom: Unexpected container termination -&gt; Root cause: OOM or cgroup limits -&gt; Fix: Adjust resource limits and monitor OOM events.<\/li>\n<li>Symptom: Slow disk I\/O -&gt; Root cause: Too many image layers, inefficient snapshotter -&gt; Fix: Use zfs or tune overlayfs.<\/li>\n<li>Symptom: Containers stuck terminating -&gt; Root cause: Stale mounts or zombie processes -&gt; Fix: Force unmount safely and kill stuck processes.<\/li>\n<li>Symptom: Security agent flags many false positives -&gt; Root cause: Overly strict seccomp profiles -&gt; Fix: Refine profiles and allow necessary syscalls.<\/li>\n<li>Symptom: Runtime class not applied -&gt; Root cause: Misconfigured CRDs or missing runtime binary -&gt; Fix: Deploy runtime and update node labels.<\/li>\n<li>Symptom: High image pull bandwidth -&gt; Root cause: No local cache and frequent redeploys -&gt; Fix: Pre-cache images and use registry caching.<\/li>\n<li>Symptom: Content store corruption -&gt; Root cause: Abrupt host power loss -&gt; Fix: Restore from backup and run integrity checks.<\/li>\n<li>Symptom: Upgrades cause mass restarts -&gt; Root cause: Breaking changes in shim or API -&gt; Fix: Validate compatibility and stagger upgrades.<\/li>\n<li>Symptom: Flaky metrics during upgrades -&gt; Root cause: Missing scrape relabel rules -&gt; Fix: Update Prometheus config to handle restart labels.<\/li>\n<li>Symptom: Unclear root cause in postmortem -&gt; Root cause: Missing logs or short retention -&gt; Fix: Increase log retention and centralize logs.<\/li>\n<li>Symptom: Too many alerts -&gt; Root cause: Low thresholds and noisy signals -&gt; Fix: Raise thresholds, group alerts, add deduplication.<\/li>\n<li>Symptom: Slow CI pipeline -&gt; Root cause: Cold image pulls and cache misses -&gt; Fix: Use pre-warmed runners and local caches.<\/li>\n<li>Symptom: Failure to use sandbox runtimes -&gt; Root cause: Incompatible runtime class config -&gt; Fix: Validate runtime class and install runtime binaries.<\/li>\n<li>Symptom: Observability blind spots -&gt; Root cause: Not collecting shim logs -&gt; Fix: Ensure shim logs are forwarded and parsed.<\/li>\n<li>Symptom: Time drift in metrics -&gt; Root cause: Unsynced node clocks -&gt; Fix: Ensure NTP and time sync across nodes.<\/li>\n<li>Symptom: Image signature validation failures -&gt; Root cause: Missing key store or wrong keys -&gt; Fix: Configure signing keys and test validations.<\/li>\n<li>Symptom: Unexpected permission errors -&gt; Root cause: Rootless configuration missing caps -&gt; Fix: Reconfigure rootless prerequisites or run as root if required.<\/li>\n<\/ol>\n\n\n\n<p>Observability pitfalls (at least 5 included above):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Missing metrics endpoint, short log retention, not collecting shim logs, unsynced clocks, noisy alert thresholds.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Best Practices &amp; Operating Model<\/h2>\n\n\n\n<p>Ownership and on-call:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ownership: Platform team owns containerd at node level; application teams own container images.<\/li>\n<li>On-call: Platform on-call handles node\/runtime incidents; app owners notified for image-level issues.<\/li>\n<\/ul>\n\n\n\n<p>Runbooks vs playbooks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Runbook: Step-by-step instructions for routine failures (restart containerd, reclaim shims).<\/li>\n<li>Playbook: Higher-level decision guides for upgrades and multi-node incidents.<\/li>\n<\/ul>\n\n\n\n<p>Safe deployments:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use canary and staged rollouts for containerd upgrades across node pools.<\/li>\n<li>Test rollback path and automate rollback triggers based on SLO degradation.<\/li>\n<\/ul>\n\n\n\n<p>Toil reduction and automation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Automate GC, image pruning, and pre-pull jobs.<\/li>\n<li>Use infra-as-code to manage containerd configs and plugins.<\/li>\n<\/ul>\n\n\n\n<p>Security basics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable seccomp and AppArmor profiles.<\/li>\n<li>Enforce image signing and vulnerability scanning.<\/li>\n<li>Run containerd with least privileges where possible and use rootless mode if supported.<\/li>\n<\/ul>\n\n\n\n<p>Weekly\/monthly routines:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Weekly: Check image pull error trends and disk usage.<\/li>\n<li>Monthly: Validate metrics retention, run integrity checks, rotate keys.<\/li>\n<li>Quarterly: Chaos test upgrades and run capacity planning.<\/li>\n<\/ul>\n\n\n\n<p>What to review in postmortems related to containerd:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Timeline of containerd events and restarts.<\/li>\n<li>Image pull metrics and registry responses.<\/li>\n<li>Node-level resource usage and snapshotter logs.<\/li>\n<li>Actions to prevent recurrence and test plan.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Tooling &amp; Integration Map for containerd (TABLE REQUIRED)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Category<\/th>\n<th>What it does<\/th>\n<th>Key integrations<\/th>\n<th>Notes<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>I1<\/td>\n<td>Metrics<\/td>\n<td>Exposes containerd metrics for Prometheus<\/td>\n<td>Prometheus Grafana<\/td>\n<td>Ensure metrics endpoint enabled<\/td>\n<\/tr>\n<tr>\n<td>I2<\/td>\n<td>Logging<\/td>\n<td>Collects containerd and shim logs<\/td>\n<td>Fluentd Fluent Bit ELK<\/td>\n<td>Parse JSON and journal entries<\/td>\n<\/tr>\n<tr>\n<td>I3<\/td>\n<td>Tracing<\/td>\n<td>Traces containerd operations and syscalls<\/td>\n<td>eBPF Jaeger<\/td>\n<td>Use for deep latency analysis<\/td>\n<\/tr>\n<tr>\n<td>I4<\/td>\n<td>Security<\/td>\n<td>Image scanning and runtime policy<\/td>\n<td>Notary Clair<\/td>\n<td>Integrate at CI and pull time<\/td>\n<\/tr>\n<tr>\n<td>I5<\/td>\n<td>Registry<\/td>\n<td>Stores and serves OCI images<\/td>\n<td>containerd registries mirror<\/td>\n<td>Cache popular images close to nodes<\/td>\n<\/tr>\n<tr>\n<td>I6<\/td>\n<td>Snapshotter<\/td>\n<td>Manages filesystem layers<\/td>\n<td>overlayfs zfs btrfs<\/td>\n<td>Choose based on workload and kernel<\/td>\n<\/tr>\n<tr>\n<td>I7<\/td>\n<td>Orchestrator<\/td>\n<td>Schedules workloads<\/td>\n<td>Kubernetes CRI<\/td>\n<td>CRI plugin required for kubelet<\/td>\n<\/tr>\n<tr>\n<td>I8<\/td>\n<td>Runtime<\/td>\n<td>OCI runtime implementations<\/td>\n<td>runc runsc kata<\/td>\n<td>Select for isolation needs<\/td>\n<\/tr>\n<tr>\n<td>I9<\/td>\n<td>CI\/CD<\/td>\n<td>Automates builds and image pushes<\/td>\n<td>GitLab Jenkins<\/td>\n<td>Pre-pull images for runners<\/td>\n<\/tr>\n<tr>\n<td>I10<\/td>\n<td>Monitoring<\/td>\n<td>Alerting and incident management<\/td>\n<td>Alertmanager PagerDuty<\/td>\n<td>Configure dedupe and grouping<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>None<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Frequently Asked Questions (FAQs)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What is the difference between containerd and Docker?<\/h3>\n\n\n\n<p>containerd is the runtime component; Docker Engine bundles containerd with higher-level developer UX.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Does Kubernetes require containerd?<\/h3>\n\n\n\n<p>Kubernetes requires a CRI-compatible runtime; containerd is a common choice but others like cri-o are alternatives.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How do I monitor containerd?<\/h3>\n\n\n\n<p>Enable Prometheus metrics, collect logs and shim details, and instrument snapshotter ops.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can containerd run rootless?<\/h3>\n\n\n\n<p>Rootless modes exist but features and performance vary by platform and kernel.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to handle containerd upgrades safely?<\/h3>\n\n\n\n<p>Canary upgrade node pools, monitor SLOs, and have rollback automation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What snapshotter should I use?<\/h3>\n\n\n\n<p>Depends on workload; overlayfs is common, zfs for advanced performance, vary by kernel support.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to prevent disk exhaustion from images?<\/h3>\n\n\n\n<p>Implement garbage collection, pre-pull policies, and storage quotas.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Is containerd secure by default?<\/h3>\n\n\n\n<p>Not fully; you must enable seccomp, AppArmor, image signing, and least privilege.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can I use multiple runtimes with containerd?<\/h3>\n\n\n\n<p>Yes via runtime classes and configuration for different workloads.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to debug shim leaks?<\/h3>\n\n\n\n<p>Collect process lists, check containerd logs, and run cleanup scripts carefully.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What metrics are most important?<\/h3>\n\n\n\n<p>Runtime uptime, container start latency, image pull success rate, snapshot errors.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How does containerd handle image layers?<\/h3>\n\n\n\n<p>It stores blobs in content store and snapshotter composes layers into filesystem snapshots.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can I run containerd on edge devices?<\/h3>\n\n\n\n<p>Yes; it is lightweight and can be configured for constrained environments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to integrate image signing?<\/h3>\n\n\n\n<p>Use image signing tools at CI and enforce validation at pull time via containerd hooks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Does containerd support Windows?<\/h3>\n\n\n\n<p>Supports Windows via specific builds; specifics vary depending on OS version.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to reduce cold start for serverless?<\/h3>\n\n\n\n<p>Use pre-warmed snapshots, smaller images, and local mirrors.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What causes frequent containerd restarts?<\/h3>\n\n\n\n<p>OOM, incompatible plugins, or crashing shims; diagnose via logs and core dumps.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Are there managed containerd offerings in cloud providers?<\/h3>\n\n\n\n<p>Varies \/ depends.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>containerd is a focused, pragmatic container runtime that powers modern cloud-native workloads. Its stability and extensibility make it central to Kubernetes, CI\/CD, serverless, and edge deployments. Proper instrumentation, SLO-driven operations, and careful upgrade strategies are essential for reliable production operations.<\/p>\n\n\n\n<p>Next 7 days plan:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Day 1: Enable containerd metrics and centralize logs.<\/li>\n<li>Day 2: Define SLIs and draft SLOs for start latency and pull success.<\/li>\n<li>Day 3: Build on-call dashboard and basic alerts.<\/li>\n<li>Day 4: Run content store and snapshotter health checks in staging.<\/li>\n<li>Day 5: Pre-pull critical images on a small node pool and measure impact.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Appendix \u2014 containerd Keyword Cluster (SEO)<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Primary keywords<\/li>\n<li>containerd<\/li>\n<li>container runtime<\/li>\n<li>OCI runtime<\/li>\n<li>containerd architecture<\/li>\n<li>containerd tutorial<\/li>\n<li>\n<p>containerd metrics<\/p>\n<\/li>\n<li>\n<p>Secondary keywords<\/p>\n<\/li>\n<li>containerd vs docker<\/li>\n<li>containerd vs cri-o<\/li>\n<li>containerd snapshotter<\/li>\n<li>containerd shim<\/li>\n<li>containerd prometheus<\/li>\n<li>containerd security<\/li>\n<li>containerd performance<\/li>\n<li>containerd best practices<\/li>\n<li>\n<p>containerd upgrade<\/p>\n<\/li>\n<li>\n<p>Long-tail questions<\/p>\n<\/li>\n<li>what is containerd used for<\/li>\n<li>how does containerd work in kubernetes<\/li>\n<li>how to monitor containerd metrics<\/li>\n<li>containerd snapshotter overlayfs vs zfs<\/li>\n<li>how to debug containerd shim leaks<\/li>\n<li>containerd image pull error troubleshooting<\/li>\n<li>containerd for serverless cold start reduction<\/li>\n<li>can containerd run rootless on edge devices<\/li>\n<li>what metrics should i monitor for containerd<\/li>\n<li>\n<p>how to pre-pull images with containerd<\/p>\n<\/li>\n<li>\n<p>Related terminology<\/p>\n<\/li>\n<li>OCI image spec<\/li>\n<li>runc runtime<\/li>\n<li>runsc gVisor<\/li>\n<li>CRI kubelet<\/li>\n<li>snapshotter overlayfs<\/li>\n<li>content store<\/li>\n<li>image manifest<\/li>\n<li>image signing<\/li>\n<li>seccomp profile<\/li>\n<li>AppArmor<\/li>\n<li>cgroups<\/li>\n<li>shim process<\/li>\n<li>node exporter<\/li>\n<li>kubelet CRI<\/li>\n<li>registry mirror<\/li>\n<li>pre-warmed snapshots<\/li>\n<li>hotwarm pool<\/li>\n<li>GC policy<\/li>\n<li>daemonless container<\/li>\n<li>rootless containers<\/li>\n<li>runtime class<\/li>\n<li>eBPF tracing<\/li>\n<li>Prometheus metrics<\/li>\n<li>Grafana dashboards<\/li>\n<li>containerd plugin<\/li>\n<li>filesystem snapshot<\/li>\n<li>image layer caching<\/li>\n<li>image pull success rate<\/li>\n<li>container start latency<\/li>\n<li>content integrity<\/li>\n<li>runtime sandboxing<\/li>\n<li>registry caching<\/li>\n<li>CI runner caching<\/li>\n<li>orchestration runtime<\/li>\n<li>containerd upgrade plan<\/li>\n<li>runbook containerd<\/li>\n<li>containerd observability<\/li>\n<li>containerd failure modes<\/li>\n<li>containerd troubleshooting<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>&#8212;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[149],"tags":[],"class_list":["post-1961","post","type-post","status-publish","format-standard","hentry","category-terminology"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sreschool.com\/blog\/containerd\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School\" \/>\n<meta property=\"og:description\" content=\"---\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sreschool.com\/blog\/containerd\/\" \/>\n<meta property=\"og:site_name\" content=\"SRE School\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-15T11:17:45+00:00\" \/>\n<meta name=\"author\" content=\"Rajesh Kumar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rajesh Kumar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"26 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/sreschool.com\/blog\/containerd\/\",\"url\":\"https:\/\/sreschool.com\/blog\/containerd\/\",\"name\":\"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School\",\"isPartOf\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#website\"},\"datePublished\":\"2026-02-15T11:17:45+00:00\",\"author\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\"},\"breadcrumb\":{\"@id\":\"https:\/\/sreschool.com\/blog\/containerd\/#breadcrumb\"},\"inLanguage\":\"en\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/sreschool.com\/blog\/containerd\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/sreschool.com\/blog\/containerd\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/sreschool.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/sreschool.com\/blog\/#website\",\"url\":\"https:\/\/sreschool.com\/blog\/\",\"name\":\"SRESchool\",\"description\":\"Master SRE. Build Resilient Systems. Lead the Future of Reliability\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/sreschool.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\",\"name\":\"Rajesh Kumar\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"caption\":\"Rajesh Kumar\"},\"sameAs\":[\"http:\/\/sreschool.com\/blog\"],\"url\":\"https:\/\/sreschool.com\/blog\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sreschool.com\/blog\/containerd\/","og_locale":"en_US","og_type":"article","og_title":"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School","og_description":"---","og_url":"https:\/\/sreschool.com\/blog\/containerd\/","og_site_name":"SRE School","article_published_time":"2026-02-15T11:17:45+00:00","author":"Rajesh Kumar","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Rajesh Kumar","Est. reading time":"26 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/sreschool.com\/blog\/containerd\/","url":"https:\/\/sreschool.com\/blog\/containerd\/","name":"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School","isPartOf":{"@id":"https:\/\/sreschool.com\/blog\/#website"},"datePublished":"2026-02-15T11:17:45+00:00","author":{"@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201"},"breadcrumb":{"@id":"https:\/\/sreschool.com\/blog\/containerd\/#breadcrumb"},"inLanguage":"en","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sreschool.com\/blog\/containerd\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/sreschool.com\/blog\/containerd\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/sreschool.com\/blog\/"},{"@type":"ListItem","position":2,"name":"What is containerd? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)"}]},{"@type":"WebSite","@id":"https:\/\/sreschool.com\/blog\/#website","url":"https:\/\/sreschool.com\/blog\/","name":"SRESchool","description":"Master SRE. Build Resilient Systems. Lead the Future of Reliability","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sreschool.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en"},{"@type":"Person","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201","name":"Rajesh Kumar","image":{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","caption":"Rajesh Kumar"},"sameAs":["http:\/\/sreschool.com\/blog"],"url":"https:\/\/sreschool.com\/blog\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/1961","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/comments?post=1961"}],"version-history":[{"count":0,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/1961\/revisions"}],"wp:attachment":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/media?parent=1961"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/categories?post=1961"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/tags?post=1961"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}