Reliability Is an Emergent Property, Not a Root Cause

Looking for Root Causes is a False Path

In a recent podcast, David Blank-Edelman, program lead for Microsoft’s SRE Academy, argued that searching for a single root cause of system failures is counterproductive. He emphasized that reliability is an emergent property of complex systems, shaped by interactions between technical and socio-technical factors, not by isolating one failure point.

Why This Matters

Traditional root cause analysis assumes a single failure point, but modern systems are too interconnected for this approach. As Blank-Edelman explains, failures often stem from multiple contributing factors, including human decisions, design trade-offs, and unanticipated interactions. For example, the B-12 bomber crash in WWII was not caused by a single mechanical failure but by poorly designed cockpit switches, a socio-technical oversight. Focusing on root causes ignores these systemic issues, leading to recurring failures and missed opportunities for systemic improvement.

Key Insights

“Reliability is an emergent property of an architecture and can include any property important to the customer, such as availability or durability.” (David Blank-Edelman, 2025)
“Failures have multiple causes, some of which are socio-technological in nature.” (Podcast transcript, 2025)
“Temporal used by Stripe, Coinbase” (Example of tools for managing distributed workflows)

Practical Applications

Use Case: Microsoft’s SRE Academy trains Azure engineers to focus on systemic feedback loops rather than isolated incidents.
Pitfall: Prematurely blaming human error or a single component in post-incident reviews can mask deeper systemic issues, leading to recurring outages.

References:

https://www.infoq.com/podcasts/looking-root-causes-false-path/

On This Page

Looking for Root Causes is a False Path

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Why Code Isn't the Only Cause of Production Failures: Insights from SRE Expert Anish

Law Outranks Intelligence: Why CORE's Constitutional Design Blocks AI (and Humans) from Bypassing Rules

Architecture Should Model the Real World: Lessons from Software Failures and Resilience Strategies