{"id":1807,"date":"2026-02-15T08:11:04","date_gmt":"2026-02-15T08:11:04","guid":{"rendered":"https:\/\/sreschool.com\/blog\/duration-red\/"},"modified":"2026-02-15T08:11:04","modified_gmt":"2026-02-15T08:11:04","slug":"duration-red","status":"publish","type":"post","link":"https:\/\/sreschool.com\/blog\/duration-red\/","title":{"rendered":"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)"},"content":{"rendered":"\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Quick Definition (30\u201360 words)<\/h2>\n\n\n\n<p>Duration RED is a reliability metric that tracks request or operation duration as a primary service-level indicator, emphasizing tail latency and percentiles. Analogy: think of highway travel time rather than just speed limits. Formal: Duration RED = SLIs derived from duration percentiles across user-facing transactions.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">What is Duration RED?<\/h2>\n\n\n\n<p>Duration RED is a focused extension of the RED observability pattern that highlights duration (latency) as the core signal for customer experience. It is not simply average response time; it prioritizes distribution and tail behavior for user-facing work. Duration RED complements error and saturation signals by revealing when operations are slow enough to cause timeouts, retries, or poor UX.<\/p>\n\n\n\n<p>What it is NOT:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Not merely mean or median duration.<\/li>\n<li>Not a replacement for error-rate monitoring.<\/li>\n<li>Not an infrastructure-only metric; it requires application-level instrumentation to be meaningful.<\/li>\n<\/ul>\n\n\n\n<p>Key properties and constraints:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Emphasizes percentile-based SLIs (p50, p95, p99, p999).<\/li>\n<li>Requires consistent, high-cardinality tagging to attribute latency.<\/li>\n<li>Sensitive to sampling, clock skew, and aggregation windows.<\/li>\n<li>Needs correlation with errors, retries, and throughput to diagnose impact.<\/li>\n<\/ul>\n\n\n\n<p>Where it fits in modern cloud\/SRE workflows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Primary SLI for user-facing APIs, RPCs, and UI transactions.<\/li>\n<li>Used in SLOs and error budgets tied to customer experience.<\/li>\n<li>Drives incident prioritization and auto-scaling decisions.<\/li>\n<li>Integrated with CI\/CD, chaos experiments, and performance budgets.<\/li>\n<\/ul>\n\n\n\n<p>Diagram description (text-only):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Client issues request -&gt; Ingress\/load balancer (measures start) -&gt; Edge proxy (adds latency) -&gt; Service A (handles business logic) -&gt; Downstream calls to Service B and DB -&gt; Service A response -&gt; Observability pipeline aggregates duration spans -&gt; Alerting evaluates percentiles against SLO -&gt; On-call receives page or ticket.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Duration RED in one sentence<\/h3>\n\n\n\n<p>Duration RED focuses on latency percentiles of user-facing requests as primary SLIs to protect customer experience and guide SRE operations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Duration RED vs related terms (TABLE REQUIRED)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Term<\/th>\n<th>How it differs from Duration RED<\/th>\n<th>Common confusion<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>T1<\/td>\n<td>RED (classic)<\/td>\n<td>Duration RED focuses on duration specifically<\/td>\n<td>People think RED only uses counters<\/td>\n<\/tr>\n<tr>\n<td>T2<\/td>\n<td>Apdex<\/td>\n<td>Apdex is threshold-based satisfaction score<\/td>\n<td>Apdex hides tail behavior<\/td>\n<\/tr>\n<tr>\n<td>T3<\/td>\n<td>P95 latency<\/td>\n<td>Single percentile view of duration<\/td>\n<td>P95 is easier but may miss tails<\/td>\n<\/tr>\n<tr>\n<td>T4<\/td>\n<td>Mean latency<\/td>\n<td>Arithmetic mean may hide skew<\/td>\n<td>Mean often underestimates tail pain<\/td>\n<\/tr>\n<tr>\n<td>T5<\/td>\n<td>SLA<\/td>\n<td>SLA is contractual and legal<\/td>\n<td>SLA may not map to technical SLO<\/td>\n<\/tr>\n<tr>\n<td>T6<\/td>\n<td>SLO<\/td>\n<td>SLO is target; Duration RED is SLI input<\/td>\n<td>SLO is policy not measurement<\/td>\n<\/tr>\n<tr>\n<td>T7<\/td>\n<td>Error budget<\/td>\n<td>Error budget is allowance; uses Duration RED<\/td>\n<td>Budgets usually tied to errors not latency<\/td>\n<\/tr>\n<tr>\n<td>T8<\/td>\n<td>Quantile estimation<\/td>\n<td>Statistical method, not an SLI itself<\/td>\n<td>Confused with exact percentiles<\/td>\n<\/tr>\n<tr>\n<td>T9<\/td>\n<td>End-to-end tracing<\/td>\n<td>Traces provide context for duration<\/td>\n<td>Tracing alone is not aggregated SLI<\/td>\n<\/tr>\n<tr>\n<td>T10<\/td>\n<td>Throughput<\/td>\n<td>Throughput is request rate, not duration<\/td>\n<td>High throughput can affect duration<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if any cell says \u201cSee details below\u201d)<\/h4>\n\n\n\n<p>Not required.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Why does Duration RED matter?<\/h2>\n\n\n\n<p>Business impact:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Revenue: Slow experiences reduce conversions and retention.<\/li>\n<li>Trust: Users expect consistent response times; variability erodes confidence.<\/li>\n<li>Risk: Latency spikes can trigger cascading retries and increased costs.<\/li>\n<\/ul>\n\n\n\n<p>Engineering impact:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Incident reduction: Early detection of duration inflation reduces severity.<\/li>\n<li>Velocity: Clear SLOs for duration reduce firefighting and improve deployments.<\/li>\n<li>Architecture decisions: Informs caching, decompositions, and database tuning.<\/li>\n<\/ul>\n\n\n\n<p>SRE framing:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SLIs\/SLOs: Duration percentiles become primary SLIs for user actions.<\/li>\n<li>Error budget: Budget burn can be caused by tail latency rather than errors.<\/li>\n<li>Toil\/on-call: Better instrumentation reduces manual investigation time.<\/li>\n<\/ul>\n\n\n\n<p>What breaks in production (realistic examples):<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Payment API p99 spikes due to sync DB index contention causing checkout failures.<\/li>\n<li>UI load becomes sluggish when a third-party CDN has degraded performance.<\/li>\n<li>Kubernetes node autoscaler scales slowly because probe durations exceed thresholds, causing rolling restarts to fail.<\/li>\n<li>Serverless function cold starts increase p95 beyond SLO after a deployment with larger container image.<\/li>\n<li>Distributed transaction increases tail latency after a library upgrade that changed timeouts.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Where is Duration RED used? (TABLE REQUIRED)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Layer\/Area<\/th>\n<th>How Duration RED appears<\/th>\n<th>Typical telemetry<\/th>\n<th>Common tools<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>L1<\/td>\n<td>Edge \/ CDN<\/td>\n<td>Request-to-first-byte and full response time<\/td>\n<td>TTFB p95 p99 and status codes<\/td>\n<td>Edge logs and synthetic checks<\/td>\n<\/tr>\n<tr>\n<td>L2<\/td>\n<td>Ingress \/ API gateway<\/td>\n<td>Request duration and upstream time<\/td>\n<td>Route p95 p99 and upstream latency<\/td>\n<td>API gateway metrics and tracing<\/td>\n<\/tr>\n<tr>\n<td>L3<\/td>\n<td>Service (app)<\/td>\n<td>Handler durations and downstream waits<\/td>\n<td>Span durations and histograms<\/td>\n<td>APM and tracing SDKs<\/td>\n<\/tr>\n<tr>\n<td>L4<\/td>\n<td>Datastore<\/td>\n<td>Query execution and replication lag<\/td>\n<td>Query duration percentiles and locks<\/td>\n<td>DB metrics and slow logs<\/td>\n<\/tr>\n<tr>\n<td>L5<\/td>\n<td>Messaging \/ Queue<\/td>\n<td>Time in queue and processing time<\/td>\n<td>Queue wait and handler duration<\/td>\n<td>Broker metrics and consumer traces<\/td>\n<\/tr>\n<tr>\n<td>L6<\/td>\n<td>Serverless \/ FaaS<\/td>\n<td>Cold start and execution time<\/td>\n<td>Invocation duration histogram<\/td>\n<td>Cloud provider function metrics<\/td>\n<\/tr>\n<tr>\n<td>L7<\/td>\n<td>Kubernetes infra<\/td>\n<td>Pod startup and liveness probe durations<\/td>\n<td>Container start and readiness times<\/td>\n<td>K8s metrics and events<\/td>\n<\/tr>\n<tr>\n<td>L8<\/td>\n<td>CI\/CD<\/td>\n<td>Build and deploy durations<\/td>\n<td>Job runtime histograms<\/td>\n<td>CI metrics and pipelines<\/td>\n<\/tr>\n<tr>\n<td>L9<\/td>\n<td>Observability pipeline<\/td>\n<td>Ingest and query latency<\/td>\n<td>Ingest lag and query time<\/td>\n<td>Monitoring backend metrics<\/td>\n<\/tr>\n<tr>\n<td>L10<\/td>\n<td>Security tooling<\/td>\n<td>Scan durations and blocking times<\/td>\n<td>Scan job duration percentiles<\/td>\n<td>Security scanners and plugin metrics<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<p>Not required.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">When should you use Duration RED?<\/h2>\n\n\n\n<p>When necessary:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Customer-facing APIs or UI where response time affects experience.<\/li>\n<li>Systems with SLAs or performance-sensitive flows like payments or search.<\/li>\n<li>Services with high variability or complex downstream dependencies.<\/li>\n<\/ul>\n\n\n\n<p>When optional:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Internal batch jobs where throughput matters more than latency.<\/li>\n<li>Background tasks where latency doesn&#8217;t affect user experience.<\/li>\n<\/ul>\n\n\n\n<p>When NOT to use \/ overuse it:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For purely asynchronous pipelines where latency is not user-visible.<\/li>\n<li>As a sole SLI for services dominated by availability or correctness issues.<\/li>\n<li>Over-instrumenting low-value internal endpoints creates noise.<\/li>\n<\/ul>\n\n\n\n<p>Decision checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If request results are user-visible AND latency affects UX -&gt; use Duration RED.<\/li>\n<li>If operation is async AND not customer-visible -&gt; prefer throughput or success-rate SLI.<\/li>\n<li>If SLOs already exist but incidents are due to errors not latency -&gt; prioritize error SLI.<\/li>\n<\/ul>\n\n\n\n<p>Maturity ladder:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Beginner: Instrument p95 and p99 histograms for critical endpoints.<\/li>\n<li>Intermediate: Add labels for key dimensions and implement SLOs with alerting.<\/li>\n<li>Advanced: Use adaptive SLOs, service-level objectives per user cohort, and automated remediation.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">How does Duration RED work?<\/h2>\n\n\n\n<p>Components and workflow:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Instrumentation: Application records start and end times for transactions and spans.<\/li>\n<li>Aggregation: Metrics backend ingests histograms or quantile summaries.<\/li>\n<li>Evaluation: Compute SLIs (p95\/p99) and compare with SLO targets.<\/li>\n<li>Alerting: Generate alerts based on burn rate or absolute threshold breaches.<\/li>\n<li>Response: On-call follows runbook for latency incidents and triggers mitigations.<\/li>\n<li>Remediation: Autoscaling, circuit breakers, caching, or rollbacks.<\/li>\n<li>Postmortem: Analyze traces and metrics, update SLOs and automation.<\/li>\n<\/ol>\n\n\n\n<p>Data flow and lifecycle:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Request enters -&gt; instrumentation creates spans -&gt; spans emit durations -&gt; metrics collector converts spans to histograms -&gt; durable store holds time series -&gt; query computes percentiles -&gt; alerting evaluates conditions -&gt; feedback to incident workflow.<\/li>\n<\/ul>\n\n\n\n<p>Edge cases and failure modes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sampling discards tail spans and masks real latency.<\/li>\n<li>Clock skew across hosts distorts durations.<\/li>\n<li>Aggregation windows hide transient spikes.<\/li>\n<li>Low-volume endpoints produce noisy percentile estimates.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Typical architecture patterns for Duration RED<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Client-observed SLI pattern: Measure round-trip time at client SDKs. Use when client-side network impact matters.<\/li>\n<li>Server-side histogram + tracing: Service emits high-resolution histograms and traces. Use for backend services with many dependencies.<\/li>\n<li>Distributed tracing-first: Use traces to attribute duration across call graph; compute service-level SLIs from tracing spans. Use for microservices with complex topology.<\/li>\n<li>Synthetic + real user monitoring (RUM): Combine synthetic checks with RUM for frontend and third-party visibility.<\/li>\n<li>Per-endpoint SLOs with traffic shaping: Apply SLOs per critical endpoint and throttle or route noncritical traffic during degradation.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Failure modes &amp; mitigation (TABLE REQUIRED)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Failure mode<\/th>\n<th>Symptom<\/th>\n<th>Likely cause<\/th>\n<th>Mitigation<\/th>\n<th>Observability signal<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>F1<\/td>\n<td>Sampling bias<\/td>\n<td>Undetected tail latency<\/td>\n<td>High sampling rate on low-value traces<\/td>\n<td>Increase sampling for errors and tails<\/td>\n<td>Trace sample rate drop<\/td>\n<\/tr>\n<tr>\n<td>F2<\/td>\n<td>Clock skew<\/td>\n<td>Negative or absurd durations<\/td>\n<td>Unsynced host clocks<\/td>\n<td>Use monotonic timers or sync time<\/td>\n<td>Host clock drift metric<\/td>\n<\/tr>\n<tr>\n<td>F3<\/td>\n<td>Aggregation lag<\/td>\n<td>Delayed alerts<\/td>\n<td>Monitoring pipeline backpressure<\/td>\n<td>Scale ingest or lower resolution<\/td>\n<td>Ingest lag metric<\/td>\n<\/tr>\n<tr>\n<td>F4<\/td>\n<td>Metric cardinality<\/td>\n<td>High cost and slow queries<\/td>\n<td>Too many labels<\/td>\n<td>Reduce labels and use rollups<\/td>\n<td>Cardinality metric<\/td>\n<\/tr>\n<tr>\n<td>F5<\/td>\n<td>Misattributed latency<\/td>\n<td>Blame wrong service<\/td>\n<td>Missing context or traces<\/td>\n<td>Add context propagation<\/td>\n<td>High downstream p99<\/td>\n<\/tr>\n<tr>\n<td>F6<\/td>\n<td>Percentile noise<\/td>\n<td>Flapping percentiles<\/td>\n<td>Low traffic for endpoint<\/td>\n<td>Use smoothing or use lower percentile<\/td>\n<td>Low sample count metric<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<p>Not required.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Key Concepts, Keywords &amp; Terminology for Duration RED<\/h2>\n\n\n\n<p>This glossary gives concise definitions and common pitfalls. Each entry is Term \u2014 definition \u2014 why it matters \u2014 common pitfall.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Duration \u2014 time between request start and completion \u2014 Primary SLI basis \u2014 Confused with CPU time.<\/li>\n<li>Latency distribution \u2014 spread of durations across requests \u2014 Shows tail behavior \u2014 Ignoring tails.<\/li>\n<li>Percentile (p95, p99) \u2014 value below which X% of samples fall \u2014 Captures UX impact \u2014 Using only p95 hides p999.<\/li>\n<li>Tail latency \u2014 extreme high percentiles \u2014 Often causes user-visible failure \u2014 Hard to estimate at low volume.<\/li>\n<li>Histogram \u2014 bucketed distribution \u2014 Efficient for aggregation \u2014 Coarse buckets mask detail.<\/li>\n<li>Summaries \/ sketches \u2014 approximate quantiles \u2014 Low memory cost \u2014 Complexity in interpretation.<\/li>\n<li>Quantile estimation \u2014 algorithmic percentile calculation \u2014 Balances accuracy and cost \u2014 Implementation differences.<\/li>\n<li>SLI \u2014 service-level indicator \u2014 Measure of system behavior \u2014 Wrongly chosen SLI misguides ops.<\/li>\n<li>SLO \u2014 service-level objective \u2014 Target for SLIs \u2014 Too strict SLO causes alert fatigue.<\/li>\n<li>SLA \u2014 service-level agreement \u2014 Contractual obligation \u2014 Legal implication often omitted.<\/li>\n<li>Error budget \u2014 allowable SLO violations \u2014 Drives release decisions \u2014 Undervaluing latency burn.<\/li>\n<li>RED method \u2014 Rate, Errors, Duration \u2014 Observability pattern \u2014 Often misused as only counters.<\/li>\n<li>RUM \u2014 Real user monitoring \u2014 Client-side duration capture \u2014 Privacy and sampling concerns.<\/li>\n<li>Synthetic monitoring \u2014 scripted checks \u2014 Detect regressions proactively \u2014 May miss real user paths.<\/li>\n<li>Tracing \u2014 distributed context for requests \u2014 Helps attribution \u2014 Sampling limits visibility.<\/li>\n<li>Span \u2014 tracing unit of work \u2014 Identifies component durations \u2014 Incomplete spans mislead.<\/li>\n<li>Client-observed SLI \u2014 measured by client SDK \u2014 Includes network and render time \u2014 Harder to control.<\/li>\n<li>Server-observed SLI \u2014 measured by server \u2014 Excludes client view \u2014 Misses client-side issues.<\/li>\n<li>Cold start \u2014 serverless startup latency \u2014 Affects p95\/p99 \u2014 Overprovisioning increases cost.<\/li>\n<li>Probe latency \u2014 readiness\/liveness probe durations \u2014 Affects orchestration \u2014 Probe misconfig breaks scaling.<\/li>\n<li>Autoscaling \u2014 adjust capacity based on metrics \u2014 Uses duration to scale for responsiveness \u2014 Reactive scaling can be late.<\/li>\n<li>Circuit breaker \u2014 stop calling slow dependencies \u2014 Prevents cascading latency \u2014 Misconfiguration leads to availability loss.<\/li>\n<li>Retry storm \u2014 repeated retries increasing load \u2014 Exacerbates latency \u2014 Retry budget missing.<\/li>\n<li>Backpressure \u2014 flow control when downstream is slow \u2014 Prevents queue growth \u2014 Hard to implement across systems.<\/li>\n<li>Token bucket \u2014 rate-limiting algorithm \u2014 Limits concurrent load \u2014 Overthrottling hurts UX.<\/li>\n<li>P95 flapping \u2014 percentile oscillation \u2014 Causes noisy alerts \u2014 Use smoothing and burn-rate checks.<\/li>\n<li>Observability pipeline \u2014 ingestion, storage, visualization \u2014 Central to duration analysis \u2014 Single point of failure if not scaled.<\/li>\n<li>Cardinality \u2014 number of unique label combinations \u2014 Affects cost \u2014 High cardinality increases backend stress.<\/li>\n<li>Aggregation window \u2014 time range for percentile calculation \u2014 Longer windows stabilize but delay response \u2014 Too short causes noise.<\/li>\n<li>Sample rate \u2014 fraction of traces collected \u2014 Balances cost\/visibility \u2014 Too low hides tails.<\/li>\n<li>Monotonic clock \u2014 non-decreasing timer \u2014 Accurate durations despite system time changes \u2014 Not always used by SDKs.<\/li>\n<li>Probe jitter \u2014 avoid synchronized probes \u2014 Prevents thundering herd \u2014 Forgotten in default configs.<\/li>\n<li>Service mesh \u2014 adds network hop latency \u2014 Affects p95 \u2014 Transparent instrumentation needed.<\/li>\n<li>Sidecar proxy \u2014 local network proxy for service mesh \u2014 Captures durations \u2014 Adds overhead.<\/li>\n<li>QoS \u2014 quality of service classes \u2014 Prioritize latency-sensitive flows \u2014 Complexity in enforcement.<\/li>\n<li>Smoothing window \u2014 moving average for percentile signals \u2014 Reduces noise \u2014 Masks short incidents.<\/li>\n<li>Load spike \u2014 sudden increase in traffic \u2014 Causes tail latency \u2014 Autoscaling lag can worsen impact.<\/li>\n<li>Capacity planning \u2014 reserve headroom for latency spikes \u2014 Prevents budget burn \u2014 Overprovisioning cost tradeoff.<\/li>\n<li>Chaos engineering \u2014 inject faults to surface latency issues \u2014 Improves resilience \u2014 Requires careful scoping.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">How to Measure Duration RED (Metrics, SLIs, SLOs) (TABLE REQUIRED)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Metric\/SLI<\/th>\n<th>What it tells you<\/th>\n<th>How to measure<\/th>\n<th>Starting target<\/th>\n<th>Gotchas<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>M1<\/td>\n<td>p95 request duration<\/td>\n<td>Typical slow-but-common user impact<\/td>\n<td>Histogram quantile per route<\/td>\n<td>200ms for critical APIs<\/td>\n<td>p95 misses rare tails<\/td>\n<\/tr>\n<tr>\n<td>M2<\/td>\n<td>p99 request duration<\/td>\n<td>Tail user impact<\/td>\n<td>Histogram quantile per route<\/td>\n<td>1s for interactive flows<\/td>\n<td>Requires high sample counts<\/td>\n<\/tr>\n<tr>\n<td>M3<\/td>\n<td>p999 duration<\/td>\n<td>Extreme tail risk<\/td>\n<td>Sketches or streaming quantiles<\/td>\n<td>3s for critical flows<\/td>\n<td>Very noisy at low volume<\/td>\n<\/tr>\n<tr>\n<td>M4<\/td>\n<td>Error rate within high-duration requests<\/td>\n<td>Correlates latency with failures<\/td>\n<td>Count errors where duration &gt; threshold<\/td>\n<td>&lt;1% of slow requests<\/td>\n<td>Need correlation labels<\/td>\n<\/tr>\n<tr>\n<td>M5<\/td>\n<td>Queue wait time<\/td>\n<td>Backpressure and scheduling delays<\/td>\n<td>Histogram on dequeue time<\/td>\n<td>50ms for critical queues<\/td>\n<td>Ignored in single-service views<\/td>\n<\/tr>\n<tr>\n<td>M6<\/td>\n<td>Cold start rate<\/td>\n<td>Frequency of high latency due to cold starts<\/td>\n<td>Percentage of invocations with startup &gt;X<\/td>\n<td>&lt;1%<\/td>\n<td>Requires function-level instrumentation<\/td>\n<\/tr>\n<tr>\n<td>M7<\/td>\n<td>Client-observed RTT<\/td>\n<td>End-user experienced duration<\/td>\n<td>Frontend SDK or RUM<\/td>\n<td>300ms<\/td>\n<td>Network and client render add variance<\/td>\n<\/tr>\n<tr>\n<td>M8<\/td>\n<td>Backend processing time<\/td>\n<td>Internal compute latency<\/td>\n<td>Service spans excluding network<\/td>\n<td>100ms<\/td>\n<td>Missing downstream time<\/td>\n<\/tr>\n<tr>\n<td>M9<\/td>\n<td>Ingest lag<\/td>\n<td>Observability pipeline delay<\/td>\n<td>Time from event to availability<\/td>\n<td>&lt;30s<\/td>\n<td>High pipeline load increases lag<\/td>\n<\/tr>\n<tr>\n<td>M10<\/td>\n<td>Percentile sample count<\/td>\n<td>Confidence in percentile<\/td>\n<td>Count samples per window<\/td>\n<td>&gt;10k samples<\/td>\n<td>Low-volume endpoints need smoothing<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<p>Not required.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Best tools to measure Duration RED<\/h3>\n\n\n\n<p>Choose tooling based on environment and scale. Below are recommended tools and structured details.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 OpenTelemetry<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for Duration RED: Traces and span durations; histogram metrics.<\/li>\n<li>Best-fit environment: Microservices, multi-cloud, hybrid.<\/li>\n<li>Setup outline:<\/li>\n<li>Instrument services with OpenTelemetry SDKs.<\/li>\n<li>Configure exporters to tracing\/metrics backend.<\/li>\n<li>Ensure high-resolution histograms enabled.<\/li>\n<li>Set sampling policy for error and tail traces.<\/li>\n<li>Strengths:<\/li>\n<li>Vendor-neutral standard.<\/li>\n<li>Rich context propagation.<\/li>\n<li>Limitations:<\/li>\n<li>Requires backend for storage and visualization.<\/li>\n<li>Sampling strategy complexity.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 Prometheus + Histogram\/Exemplar<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for Duration RED: Aggregated histograms and exemplars linked to traces.<\/li>\n<li>Best-fit environment: Kubernetes and self-managed stacks.<\/li>\n<li>Setup outline:<\/li>\n<li>Export histograms from app metrics.<\/li>\n<li>Use exemplars to connect histogram buckets to traces.<\/li>\n<li>Use recording rules for percentiles.<\/li>\n<li>Tune scrape intervals and retention.<\/li>\n<li>Strengths:<\/li>\n<li>Open-source and widely adopted.<\/li>\n<li>Strong alerting integration.<\/li>\n<li>Limitations:<\/li>\n<li>Percentile calculation over sliding windows requires care.<\/li>\n<li>High cardinality costs.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 Managed APM (vendor)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for Duration RED: End-to-end traces, service maps, histograms.<\/li>\n<li>Best-fit environment: Teams needing turnkey tracing and dashboards.<\/li>\n<li>Setup outline:<\/li>\n<li>Deploy vendor agents or SDKs.<\/li>\n<li>Tag key dimensions and enable distributed tracing.<\/li>\n<li>Configure dashboards and SLOs in vendor console.<\/li>\n<li>Strengths:<\/li>\n<li>Quick time-to-value.<\/li>\n<li>Integrated analytics.<\/li>\n<li>Limitations:<\/li>\n<li>Cost and vendor lock-in considerations.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 Real User Monitoring (RUM) SDKs<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for Duration RED: Client-observed round trips and page load durations.<\/li>\n<li>Best-fit environment: Frontend web and mobile apps.<\/li>\n<li>Setup outline:<\/li>\n<li>Add RUM SDK to frontend.<\/li>\n<li>Capture page load, navigation timing, and XHR durations.<\/li>\n<li>Sample and redact sensitive data.<\/li>\n<li>Strengths:<\/li>\n<li>Measures real-user experience.<\/li>\n<li>Limitations:<\/li>\n<li>Privacy and sampling constraints.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Tool \u2014 Synthetic monitoring \/ Synthetics<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What it measures for Duration RED: End-to-end scripted transaction durations from multiple locations.<\/li>\n<li>Best-fit environment: Global services and external dependencies.<\/li>\n<li>Setup outline:<\/li>\n<li>Define critical journeys as scripts.<\/li>\n<li>Run at regular intervals from key locations.<\/li>\n<li>Alert on threshold or SLO violations.<\/li>\n<li>Strengths:<\/li>\n<li>Predictable and repeatable checks.<\/li>\n<li>Limitations:<\/li>\n<li>May not reflect real traffic patterns.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Recommended dashboards &amp; alerts for Duration RED<\/h3>\n\n\n\n<p>Executive dashboard:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>High-level SLO adherence: p95\/p99 vs target across business-critical services.<\/li>\n<li>Trend of error budget burn.<\/li>\n<li>Top 5 services by p99 increase and business impact rationale.<\/li>\n<\/ul>\n\n\n\n<p>On-call dashboard:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Live percentiles per route and recent heatmap.<\/li>\n<li>Top slow traces and recent deploys.<\/li>\n<li>Alerts with burn-rate and threshold state.<\/li>\n<\/ul>\n\n\n\n<p>Debug dashboard:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Per-service span waterfall for recent slow requests.<\/li>\n<li>Downstream call durations and queue times.<\/li>\n<li>Host\/instance metrics and probe timings.<\/li>\n<\/ul>\n\n\n\n<p>Alerting guidance:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Page vs ticket: Page for SLO burn-rate breaches (high burn or sustained p99 breach). Ticket for isolated nonbusiness-critical p95 violations.<\/li>\n<li>Burn-rate guidance: Page when burn rate exceeds 4x for a sliding window and remaining error budget is low. Ticket when transient or single-window spike.<\/li>\n<li>Noise reduction: Use grouping by service and route; dedupe similar alerts; suppress during planned maintenance and releases.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Implementation Guide (Step-by-step)<\/h2>\n\n\n\n<p>1) Prerequisites:\n   &#8211; Service inventory and critical endpoint list.\n   &#8211; Observability pipeline capacity planning.\n   &#8211; Standardized instrumentation libraries.<\/p>\n\n\n\n<p>2) Instrumentation plan:\n   &#8211; Identify entry points and beats to measure start\/end times.\n   &#8211; Implement histograms and traces with consistent labels.\n   &#8211; Ensure monotonic timers used where possible.<\/p>\n\n\n\n<p>3) Data collection:\n   &#8211; Configure exporters to metrics and tracing backends.\n   &#8211; Ensure exemplars link metrics to traces when possible.\n   &#8211; Set retention and resolution policies.<\/p>\n\n\n\n<p>4) SLO design:\n   &#8211; Define SLO per customer-impacting endpoint.\n   &#8211; Choose percentile and window suitable for traffic.\n   &#8211; Define error budget policy and burn actions.<\/p>\n\n\n\n<p>5) Dashboards:\n   &#8211; Build executive, on-call, and debug dashboards.\n   &#8211; Include distribution heatmaps and top slow traces.<\/p>\n\n\n\n<p>6) Alerts &amp; routing:\n   &#8211; Implement burn-rate and absolute threshold alerts.\n   &#8211; Route to correct teams by service ownership and escalation.<\/p>\n\n\n\n<p>7) Runbooks &amp; automation:\n   &#8211; Document mitigation steps (scale up, rollback, circuit break).\n   &#8211; Automate common remediations where safe.<\/p>\n\n\n\n<p>8) Validation (load\/chaos\/game days):\n   &#8211; Perform load tests to validate SLOs.\n   &#8211; Run chaos experiments to ensure fallbacks operate.<\/p>\n\n\n\n<p>9) Continuous improvement:\n   &#8211; Weekly review of SLO posture.\n   &#8211; Postmortems for every SLO breach.<\/p>\n\n\n\n<p>Pre-production checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Instrumentation in place for all critical endpoints.<\/li>\n<li>Test traces exhibit full call graph.<\/li>\n<li>Synthetic checks validated.<\/li>\n<li>Dashboards populated with realistic data.<\/li>\n<\/ul>\n\n\n\n<p>Production readiness checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Alerts and on-call routing tested.<\/li>\n<li>Automation for mitigation validated.<\/li>\n<li>Error budget policy documented.<\/li>\n<li>Runbooks linked to alert pages.<\/li>\n<\/ul>\n\n\n\n<p>Incident checklist specific to Duration RED:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Verify SLO breach and burn rate.<\/li>\n<li>Identify top slow endpoints and recent deploys.<\/li>\n<li>Check autoscaler and probe metrics.<\/li>\n<li>Apply mitigation (traffic shaping, cache warming).<\/li>\n<li>Capture traces and create postmortem if needed.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Use Cases of Duration RED<\/h2>\n\n\n\n<p>1) Checkout API\n&#8211; Context: E-commerce payment flow.\n&#8211; Problem: Occasional p99 spikes lead to abandoned carts.\n&#8211; Why Duration RED helps: Focuses on tail that causes checkout timeouts.\n&#8211; What to measure: p95\/p99 per payment method, DB query durations.\n&#8211; Typical tools: APM, RUM, DB slow logs.<\/p>\n\n\n\n<p>2) Search endpoint\n&#8211; Context: Fast, interactive results required.\n&#8211; Problem: Increased query time when cluster shards rebalanced.\n&#8211; Why Duration RED helps: SLO-driven scaling and query optimization.\n&#8211; What to measure: p95\/p99 query times, queue wait.\n&#8211; Typical tools: Tracing, DB metrics, synthetic checks.<\/p>\n\n\n\n<p>3) Third-party auth\n&#8211; Context: External identity provider used on login.\n&#8211; Problem: External provider latency increases login failure rates.\n&#8211; Why Duration RED helps: Detects dependency slowness, informs fallback.\n&#8211; What to measure: Upstream latency and retry counts.\n&#8211; Typical tools: Tracing, synthetic monitoring.<\/p>\n\n\n\n<p>4) Mobile app onboarding\n&#8211; Context: Initial app load and API handshake.\n&#8211; Problem: Cold starts and network variability cause timeouts.\n&#8211; Why Duration RED helps: Prioritize cold start reduction and caching.\n&#8211; What to measure: Client-observed RTT, cold start rate.\n&#8211; Typical tools: RUM, function metrics.<\/p>\n\n\n\n<p>5) Serverless webhook handler\n&#8211; Context: Event-driven webhooks processed on FaaS.\n&#8211; Problem: Cold starts inflate p95 for burst traffic.\n&#8211; Why Duration RED helps: Drives warm-pool sizing and concurrency.\n&#8211; What to measure: Invocation duration histogram, cold start percentage.\n&#8211; Typical tools: Cloud function metrics.<\/p>\n\n\n\n<p>6) Streaming ingestion\n&#8211; Context: High-throughput event pipeline.\n&#8211; Problem: Backpressure causes long queue wait times and timeouts.\n&#8211; Why Duration RED helps: Surface queue wait and consumer latency.\n&#8211; What to measure: Time-in-queue percentiles, consumer processing time.\n&#8211; Typical tools: Broker metrics, tracing.<\/p>\n\n\n\n<p>7) Kubernetes probe tuning\n&#8211; Context: Liveness\/readiness probes causing restarts.\n&#8211; Problem: Probe durations exceed thresholds under load.\n&#8211; Why Duration RED helps: Ensures probes reflect realistic expectations.\n&#8211; What to measure: Probe execution time and failure counts.\n&#8211; Typical tools: K8s metrics and logs.<\/p>\n\n\n\n<p>8) API gateway rollouts\n&#8211; Context: New gateway introduces additional latency.\n&#8211; Problem: Route-level p99 increases post-upgrade.\n&#8211; Why Duration RED helps: Observability for canary validation.\n&#8211; What to measure: Upstream and downstream duration differences.\n&#8211; Typical tools: Gateway metrics, traces.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Scenario Examples (Realistic, End-to-End)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #1 \u2014 Kubernetes microservice p99 spike<\/h3>\n\n\n\n<p><strong>Context:<\/strong> A shopping-cart microservice running on Kubernetes shows p99 surfacing errors during a promotional spike.\n<strong>Goal:<\/strong> Reduce p99 from 2.5s to 800ms.\n<strong>Why Duration RED matters here:<\/strong> Tail latency causes timeouts and dropped carts.\n<strong>Architecture \/ workflow:<\/strong> Ingress -&gt; API gateway -&gt; cart service -&gt; DB -&gt; cache.\n<strong>Step-by-step implementation:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Instrument cart service spans and histograms.<\/li>\n<li>Enable exemplars to correlate slow buckets to traces.<\/li>\n<li>Deploy synthetic load matching promo traffic.<\/li>\n<li>Tune DB queries and add cache for hot items.<\/li>\n<li>Adjust HPA and probe thresholds.\n<strong>What to measure:<\/strong> p95\/p99, DB query p99, cache hit rate.\n<strong>Tools to use and why:<\/strong> Prometheus, OpenTelemetry, APM; for tracing and histograms.\n<strong>Common pitfalls:<\/strong> High cardinality tags causing slow queries.\n<strong>Validation:<\/strong> Run load test and verify p99 under SLO for 30-minute window.\n<strong>Outcome:<\/strong> p99 reduced, error budget stable during promotions.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #2 \u2014 Serverless cold start reduction<\/h3>\n\n\n\n<p><strong>Context:<\/strong> Serverless image processing API with occasional cold starts.\n<strong>Goal:<\/strong> Reduce cold-start-driven p95 from 1.8s to 350ms.\n<strong>Why Duration RED matters here:<\/strong> Client perceived slowness leads to retries.\n<strong>Architecture \/ workflow:<\/strong> API Gateway -&gt; Lambda-like functions -&gt; storage\n<strong>Step-by-step implementation:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Measure cold starts per invocation.<\/li>\n<li>Configure provisioned concurrency or warm-up invocations.<\/li>\n<li>Optimize function package size and dependencies.<\/li>\n<li>Add retries with exponential backoff.\n<strong>What to measure:<\/strong> Cold start rate, p95, function init time.\n<strong>Tools to use and why:<\/strong> Cloud function metrics, RUM for client impact.\n<strong>Common pitfalls:<\/strong> Overprovisioning increases cost without policy.\n<strong>Validation:<\/strong> Synthetic bursts with and without warm pools.\n<strong>Outcome:<\/strong> Cold start rate falls, p95 meets SLO with acceptable cost.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #3 \u2014 Incident response and postmortem for latency outage<\/h3>\n\n\n\n<p><strong>Context:<\/strong> Production incident where p99 across many services rose concurrently.\n<strong>Goal:<\/strong> Triage and restore performance; identify root cause.\n<strong>Why Duration RED matters here:<\/strong> SLO breaches triggered paging and revenue risk.\n<strong>Architecture \/ workflow:<\/strong> Multi-service transactions failing due to a shared dependency.\n<strong>Step-by-step implementation:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Page on-call based on burn rate alert.<\/li>\n<li>Use on-call dashboard to find top slow endpoints and recent deploys.<\/li>\n<li>Correlate traces showing shared dependency as bottleneck.<\/li>\n<li>Apply mitigation: circuit breaker or rollback dependency change.<\/li>\n<li>Collect artifacts and run postmortem.\n<strong>What to measure:<\/strong> SLO burn rate, root dependency p99.\n<strong>Tools to use and why:<\/strong> Tracing, APM, incident management.\n<strong>Common pitfalls:<\/strong> Lack of exemplars to correlate metrics to traces.\n<strong>Validation:<\/strong> Postmortem with action items and SLO updates.\n<strong>Outcome:<\/strong> Root cause fixed; runbook updated with mitigation steps.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Scenario #4 \u2014 Cost vs performance trade-off<\/h3>\n\n\n\n<p><strong>Context:<\/strong> A streaming service needs to balance cache sizing vs reduced tail latency.\n<strong>Goal:<\/strong> Determine cost-effective cache size to meet p95 target.\n<strong>Why Duration RED matters here:<\/strong> Latency improvements cost money; need SLO-driven decision.\n<strong>Architecture \/ workflow:<\/strong> API -&gt; service -&gt; cache -&gt; DB\n<strong>Step-by-step implementation:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Measure miss-related p99 and overall p95.<\/li>\n<li>Model cost per cache tier and expected latency reduction.<\/li>\n<li>Run A\/B with different cache sizes and track SLO compliance.<\/li>\n<li>Choose configuration optimizing cost per SLO improvement.\n<strong>What to measure:<\/strong> Cache hit rate, p95, cost per hour.\n<strong>Tools to use and why:<\/strong> Metrics backend and cost analytics.\n<strong>Common pitfalls:<\/strong> Ignoring cold cache warm-up effects.\n<strong>Validation:<\/strong> Cost\/performance dashboard and review after 2 weeks.\n<strong>Outcome:<\/strong> Selected cache tier delivers SLO compliance at acceptable cost.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Common Mistakes, Anti-patterns, and Troubleshooting<\/h2>\n\n\n\n<p>List of practical mistakes with symptom -&gt; root cause -&gt; fix.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Symptom: p95 okay but users complain. Root cause: p99 spikes. Fix: Monitor higher percentiles and adjust SLO.<\/li>\n<li>Symptom: Percentiles flapping. Root cause: low sample counts. Fix: Smoothing window and combine routes.<\/li>\n<li>Symptom: Alerts firing constantly after deploy. Root cause: overly tight SLO. Fix: Tune SLO and use deploy suppression rules.<\/li>\n<li>Symptom: Traces missing for slow requests. Root cause: sampling discarded tails. Fix: Adaptive sampling to retain slow and error traces.<\/li>\n<li>Symptom: Duration decreases but error rate increases. Root cause: Retries and early failures. Fix: Correlate error SLIs and duration.<\/li>\n<li>Symptom: High observability cost. Root cause: high cardinality metrics. Fix: Reduce labels and use rollups.<\/li>\n<li>Symptom: Metrics show low latency but users slow. Root cause: client-side rendering. Fix: Add RUM.<\/li>\n<li>Symptom: Alerts delayed. Root cause: ingest lag. Fix: Scale pipeline or lower retention resolution.<\/li>\n<li>Symptom: Probe churn and restarts. Root cause: strict probe timeouts. Fix: Tune probe durations based on p95.<\/li>\n<li>Symptom: Autoscaler not reacting. Root cause: using CPU rather than request latency. Fix: Use custom metrics like p95 or queue length.<\/li>\n<li>Symptom: Long investigation times. Root cause: missing trace context. Fix: Add consistent trace IDs and exemplars.<\/li>\n<li>Symptom: Misattributed latency to database. Root cause: absent network timing. Fix: Instrument network and downstream spans.<\/li>\n<li>Symptom: Increased costs during mitigation. Root cause: auto-scale aggressive without bounds. Fix: Add cost-aware autoscaling policies.<\/li>\n<li>Symptom: False positives during canary. Root cause: lack of canary-aware alerting. Fix: Suppress or route canary alerts.<\/li>\n<li>Symptom: Data skew across regions. Root cause: asynchronous replication lag. Fix: Measure per-region SLIs.<\/li>\n<li>Symptom: Spikes during backup windows. Root cause: maintenance tasks consuming resources. Fix: Schedule and throttle background jobs.<\/li>\n<li>Symptom: Aggregated percentile hides problem. Root cause: mixing critical and noncritical routes. Fix: Per-endpoint SLIs.<\/li>\n<li>Symptom: Alerts burst during replay. Root cause: traffic replaying causes queues. Fix: Rate-limit replay and simulate offline.<\/li>\n<li>Symptom: Noisy dashboards. Root cause: too many similar panels. Fix: Simplify and focus on key SLIs.<\/li>\n<li>Symptom: Misleading histogram buckets. Root cause: coarse buckets. Fix: Increase resolution or use sketches.<\/li>\n<li>Observability pitfall: Over-sampling client data creating privacy issues -&gt; Fix: Redact and sample appropriately.<\/li>\n<li>Observability pitfall: Not linking exemplars to traces -&gt; Fix: Enable exemplars in metrics pipeline.<\/li>\n<li>Observability pitfall: Using mean latency in dashboards -&gt; Fix: Switch to percentiles and distributions.<\/li>\n<li>Observability pitfall: Forgetting monotonic timers -&gt; Fix: Use monotonic timers in code.<\/li>\n<li>Observability pitfall: Missing dependency context -&gt; Fix: Enforce context propagation in SDKs.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Best Practices &amp; Operating Model<\/h2>\n\n\n\n<p>Ownership and on-call:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Assign SLI\/SLO ownership to service teams.<\/li>\n<li>On-call rotates per service owner; SLO breaches escalate to SLO owner.<\/li>\n<\/ul>\n\n\n\n<p>Runbooks vs playbooks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Runbooks: step-by-step for common latency incidents.<\/li>\n<li>Playbooks: higher-level strategies for complex incidents and mitigation.<\/li>\n<\/ul>\n\n\n\n<p>Safe deployments:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Canary releases with Duration RED checks.<\/li>\n<li>Automatic rollback when burn rate or p99 exceed thresholds.<\/li>\n<\/ul>\n\n\n\n<p>Toil reduction and automation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Automate scaling and cache population.<\/li>\n<li>Auto-annotate deploys and correlate with SLI changes.<\/li>\n<\/ul>\n\n\n\n<p>Security basics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid collecting PII in traces.<\/li>\n<li>Apply data redaction and sample before exporting.<\/li>\n<\/ul>\n\n\n\n<p>Weekly\/monthly routines:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Weekly: review SLO burn and recent alerts.<\/li>\n<li>Monthly: capacity planning and dependency latency review.<\/li>\n<\/ul>\n\n\n\n<p>Postmortem reviews:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Always analyze SLO breach impact.<\/li>\n<li>Update runbooks and add automated tests for regression.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Tooling &amp; Integration Map for Duration RED (TABLE REQUIRED)<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>ID<\/th>\n<th>Category<\/th>\n<th>What it does<\/th>\n<th>Key integrations<\/th>\n<th>Notes<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>I1<\/td>\n<td>Tracing<\/td>\n<td>Captures distributed spans and durations<\/td>\n<td>Metrics backends and APM<\/td>\n<td>Use exemplars to link to histograms<\/td>\n<\/tr>\n<tr>\n<td>I2<\/td>\n<td>Metrics backend<\/td>\n<td>Stores histograms and percentiles<\/td>\n<td>Tracing and alerting systems<\/td>\n<td>Tune retention vs resolution<\/td>\n<\/tr>\n<tr>\n<td>I3<\/td>\n<td>APM<\/td>\n<td>Correlates traces and service maps<\/td>\n<td>Logs, traces, metrics<\/td>\n<td>Good for rapid root cause analysis<\/td>\n<\/tr>\n<tr>\n<td>I4<\/td>\n<td>RUM<\/td>\n<td>Captures client-observed durations<\/td>\n<td>Frontend, analytics<\/td>\n<td>Privacy and sampling required<\/td>\n<\/tr>\n<tr>\n<td>I5<\/td>\n<td>Synthetic monitoring<\/td>\n<td>Scripted checks of journeys<\/td>\n<td>Status pages and SLOs<\/td>\n<td>Useful for external dependency checks<\/td>\n<\/tr>\n<tr>\n<td>I6<\/td>\n<td>CI\/CD<\/td>\n<td>Measures deploy time and rollout metrics<\/td>\n<td>Observability, alerts<\/td>\n<td>Tie SLOs to deploy gates<\/td>\n<\/tr>\n<tr>\n<td>I7<\/td>\n<td>Autoscaler<\/td>\n<td>Scales based on duration or custom metric<\/td>\n<td>Cloud provider APIs, k8s HPA<\/td>\n<td>Consider cooldowns and safety limits<\/td>\n<\/tr>\n<tr>\n<td>I8<\/td>\n<td>Service mesh<\/td>\n<td>Adds telemetry and routing<\/td>\n<td>Tracing and metrics<\/td>\n<td>Introduces network overhead<\/td>\n<\/tr>\n<tr>\n<td>I9<\/td>\n<td>DB performance tools<\/td>\n<td>Captures query durations and locks<\/td>\n<td>App tracing and APM<\/td>\n<td>Use for DB tuning and indices<\/td>\n<\/tr>\n<tr>\n<td>I10<\/td>\n<td>Incident mgmt<\/td>\n<td>Pages and documents incidents<\/td>\n<td>Monitoring and runbooks<\/td>\n<td>Automate alert enrichment<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Row Details (only if needed)<\/h4>\n\n\n\n<p>Not required.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Frequently Asked Questions (FAQs)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What percentile should I choose for Duration RED?<\/h3>\n\n\n\n<p>Start with p95 for common impact and add p99 and p999 for tail. Use business context to pick thresholds.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How long should SLO windows be?<\/h3>\n\n\n\n<p>Typical windows are 30 days; shorter windows like 7 days help detect regressions faster. Choose based on traffic and business needs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How many labels should I add to duration metrics?<\/h3>\n\n\n\n<p>Keep labels minimal for cardinality control; include service, endpoint, and environment as core labels.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can I use mean latency as an SLI?<\/h3>\n\n\n\n<p>No. Mean hides tail and is poor for UX-sensitive SLOs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How do I correlate slow requests to deploys?<\/h3>\n\n\n\n<p>Use deploy metadata annotations in metrics and traces and join by time window.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Should I measure client or server duration?<\/h3>\n\n\n\n<p>Both. Client gives true UX measure; server gives root-cause context.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to avoid noisy alerts for small endpoints?<\/h3>\n\n\n\n<p>Aggregate or combine endpoints and use smoothing windows or minimum sample thresholds.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to handle low-traffic endpoints for percentiles?<\/h3>\n\n\n\n<p>Use longer windows, smoothing, or lower percentile targets to avoid noise.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Are sketches better than histograms?<\/h3>\n\n\n\n<p>Sketches save memory and estimate quantiles; choose based on backend support and accuracy needs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is exemplar and why use it?<\/h3>\n\n\n\n<p>Exemplars link histogram buckets to a trace ID to find representative slow traces.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How do I ensure accurate duration measurement across languages?<\/h3>\n\n\n\n<p>Use standardized SDKs and monotonic timers; validate with integration tests.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to handle third-party dependency latency?<\/h3>\n\n\n\n<p>Monitor upstream latency and implement fallbacks, timeouts, and circuit breakers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How often should we review SLOs?<\/h3>\n\n\n\n<p>At least monthly and after any production incident or major release.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is burn-rate alerting?<\/h3>\n\n\n\n<p>Alerting based on the rate of SLO budget consumption. Page when burn-rate is high and remaining budget is low.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to keep observability costs under control?<\/h3>\n\n\n\n<p>Limit cardinality, sample traces, rollup metrics, and use retention tiers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to measure duration in serverless functions?<\/h3>\n\n\n\n<p>Use platform-provided invocation duration metrics and instrument cold start time.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to tune autoscaling for latency?<\/h3>\n\n\n\n<p>Use p95 or queue-length as scaling signals instead of CPU alone and tune cooldowns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to prevent retries from worsening latency?<\/h3>\n\n\n\n<p>Implement retry budgets, exponential backoff, and rate limiting.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Duration RED centralizes latency percentiles as critical SLIs to preserve customer experience, guide SRE actions, and influence architecture. It requires careful instrumentation, testable SLOs, and collaboration across teams.<\/p>\n\n\n\n<p>Next 7 days plan:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Day 1: Inventory critical endpoints and define initial p95\/p99 targets.<\/li>\n<li>Day 2: Instrument two most critical services with histograms and traces.<\/li>\n<li>Day 3: Create executive and on-call dashboards with p95\/p99 panels.<\/li>\n<li>Day 4: Implement burn-rate alerting with basic runbook.<\/li>\n<li>Day 5: Run a small-scale load test and validate percentiles.<\/li>\n<li>Day 6: Tune alerts to reduce noise and add exemplar linking.<\/li>\n<li>Day 7: Schedule a post-implementation review and define next SLO maturity steps.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Appendix \u2014 Duration RED Keyword Cluster (SEO)<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Primary keywords<\/li>\n<li>Duration RED<\/li>\n<li>Duration RED SLI<\/li>\n<li>Duration RED SLO<\/li>\n<li>latency RED<\/li>\n<li>request duration monitoring<\/li>\n<li>\n<p>duration-based SLI<\/p>\n<\/li>\n<li>\n<p>Secondary keywords<\/p>\n<\/li>\n<li>tail latency monitoring<\/li>\n<li>p99 latency SLO<\/li>\n<li>duration percentiles<\/li>\n<li>real user monitoring duration<\/li>\n<li>synthetic duration checks<\/li>\n<li>\n<p>histogram latency<\/p>\n<\/li>\n<li>\n<p>Long-tail questions<\/p>\n<\/li>\n<li>what is duration red and how to measure it<\/li>\n<li>how to set p95 and p99 SLOs for APIs<\/li>\n<li>how to instrument duration metrics in microservices<\/li>\n<li>best practices for measuring tail latency in Kubernetes<\/li>\n<li>how to correlate traces with duration histograms<\/li>\n<li>how to reduce serverless cold start latency p95<\/li>\n<li>what is exemplar in observability for duration<\/li>\n<li>how to prevent retry storms increasing latency<\/li>\n<li>how to tune autoscaler for request latency<\/li>\n<li>how to design runbooks for latency incidents<\/li>\n<li>how to calculate burn rate for duration SLOs<\/li>\n<li>what percentile should I use for user-facing APIs<\/li>\n<li>how to implement client-observed SLIs<\/li>\n<li>what are common pitfalls measuring duration red<\/li>\n<li>\n<p>how to measure duration across cloud services<\/p>\n<\/li>\n<li>\n<p>Related terminology<\/p>\n<\/li>\n<li>latency distribution<\/li>\n<li>histogram buckets<\/li>\n<li>quantile estimation<\/li>\n<li>trace exemplars<\/li>\n<li>monotonic timers<\/li>\n<li>sampling policy<\/li>\n<li>observability pipeline<\/li>\n<li>cardinality management<\/li>\n<li>burn-rate alerting<\/li>\n<li>error budget<\/li>\n<li>service-level indicator<\/li>\n<li>service-level objective<\/li>\n<li>distributed tracing<\/li>\n<li>real user monitoring<\/li>\n<li>synthetic monitoring<\/li>\n<li>canary deployments<\/li>\n<li>circuit breaker<\/li>\n<li>backpressure<\/li>\n<li>queuing delay<\/li>\n<li>cold start<\/li>\n<li>p95 p99 p999<\/li>\n<li>response time percentiles<\/li>\n<li>probe latency<\/li>\n<li>request queue time<\/li>\n<li>autoscaling latency metric<\/li>\n<li>APM for latency<\/li>\n<li>k8s probe tuning<\/li>\n<li>latency runbook<\/li>\n<li>latency postmortem<\/li>\n<li>latency heatmap<\/li>\n<li>latency dashboard<\/li>\n<li>latency SLI computation<\/li>\n<li>latency aggregation window<\/li>\n<li>service mesh latency<\/li>\n<li>exemplars to trace mapping<\/li>\n<li>RUM duration<\/li>\n<li>backend processing time<\/li>\n<li>startup time histogram<\/li>\n<li>slow query log<\/li>\n<li>queue wait histogram<\/li>\n<li>deployment latency regression<\/li>\n<li>latency cost tradeoff<\/li>\n<li>latency mitigation strategies<\/li>\n<li>latency observability best practices<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>&#8212;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[149],"tags":[],"class_list":["post-1807","post","type-post","status-publish","format-standard","hentry","category-terminology"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sreschool.com\/blog\/duration-red\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School\" \/>\n<meta property=\"og:description\" content=\"---\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sreschool.com\/blog\/duration-red\/\" \/>\n<meta property=\"og:site_name\" content=\"SRE School\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-15T08:11:04+00:00\" \/>\n<meta name=\"author\" content=\"Rajesh Kumar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rajesh Kumar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"26 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/sreschool.com\/blog\/duration-red\/\",\"url\":\"https:\/\/sreschool.com\/blog\/duration-red\/\",\"name\":\"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School\",\"isPartOf\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#website\"},\"datePublished\":\"2026-02-15T08:11:04+00:00\",\"author\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\"},\"breadcrumb\":{\"@id\":\"https:\/\/sreschool.com\/blog\/duration-red\/#breadcrumb\"},\"inLanguage\":\"en\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/sreschool.com\/blog\/duration-red\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/sreschool.com\/blog\/duration-red\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/sreschool.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/sreschool.com\/blog\/#website\",\"url\":\"https:\/\/sreschool.com\/blog\/\",\"name\":\"SRESchool\",\"description\":\"Master SRE. Build Resilient Systems. Lead the Future of Reliability\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/sreschool.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\",\"name\":\"Rajesh Kumar\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"caption\":\"Rajesh Kumar\"},\"sameAs\":[\"http:\/\/sreschool.com\/blog\"],\"url\":\"https:\/\/sreschool.com\/blog\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sreschool.com\/blog\/duration-red\/","og_locale":"en_US","og_type":"article","og_title":"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School","og_description":"---","og_url":"https:\/\/sreschool.com\/blog\/duration-red\/","og_site_name":"SRE School","article_published_time":"2026-02-15T08:11:04+00:00","author":"Rajesh Kumar","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Rajesh Kumar","Est. reading time":"26 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/sreschool.com\/blog\/duration-red\/","url":"https:\/\/sreschool.com\/blog\/duration-red\/","name":"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide) - SRE School","isPartOf":{"@id":"https:\/\/sreschool.com\/blog\/#website"},"datePublished":"2026-02-15T08:11:04+00:00","author":{"@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201"},"breadcrumb":{"@id":"https:\/\/sreschool.com\/blog\/duration-red\/#breadcrumb"},"inLanguage":"en","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sreschool.com\/blog\/duration-red\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/sreschool.com\/blog\/duration-red\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/sreschool.com\/blog\/"},{"@type":"ListItem","position":2,"name":"What is Duration RED? Meaning, Architecture, Examples, Use Cases, and How to Measure It (2026 Guide)"}]},{"@type":"WebSite","@id":"https:\/\/sreschool.com\/blog\/#website","url":"https:\/\/sreschool.com\/blog\/","name":"SRESchool","description":"Master SRE. Build Resilient Systems. Lead the Future of Reliability","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sreschool.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en"},{"@type":"Person","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201","name":"Rajesh Kumar","image":{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","caption":"Rajesh Kumar"},"sameAs":["http:\/\/sreschool.com\/blog"],"url":"https:\/\/sreschool.com\/blog\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/1807","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/comments?post=1807"}],"version-history":[{"count":0,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/1807\/revisions"}],"wp:attachment":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/media?parent=1807"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/categories?post=1807"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/tags?post=1807"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}