What is MTTR (Mean Time to Recovery) in SRE?

Fernando

What exactly makes Mean Time to Recovery the default metric for measuring incident response efficiency when it completely ignores the vast variance in outage severity? Furthermore, a single prolonged system failure skewing your monthly average can overshadow dozens of rapid, automated self-healing resolutions. How do you balance raw recovery speed against deep root-cause elimination?