How exactly do you design an SRE runbook to ensure that on-call engineers can rapidly mitigate a high-severity incident under intense pressure? Furthermore, a great runbook moves beyond vague descriptions by providing explicit, step-by-step commands and pre-approved automation scripts for system recovery. Why does failing to continuously update these documents remain the leading cause of extended mitigation times during unexpected outages?