Reliability Is an Emergent Property, Not a Root Cause
These articles are AI-generated summaries. Please check the original sources for full details.
Looking for Root Causes is a False Path
In a recent podcast, David Blank-Edelman, program lead for Microsoft’s SRE Academy, argued that searching for a single root cause of system failures is counterproductive. He emphasized that reliability is an emergent property of complex systems, shaped by interactions between technical and socio-technical factors, not by isolating one failure point.
Why This Matters
Traditional root cause analysis assumes a single failure point, but modern systems are too interconnected for this approach. As Blank-Edelman explains, failures often stem from multiple contributing factors, including human decisions, design trade-offs, and unanticipated interactions. For example, the B-12 bomber crash in WWII was not caused by a single mechanical failure but by poorly designed cockpit switches, a socio-technical oversight. Focusing on root causes ignores these systemic issues, leading to recurring failures and missed opportunities for systemic improvement.
Key Insights
- “Reliability is an emergent property of an architecture and can include any property important to the customer, such as availability or durability.” (David Blank-Edelman, 2025)
- “Failures have multiple causes, some of which are socio-technological in nature.” (Podcast transcript, 2025)
- “Temporal used by Stripe, Coinbase” (Example of tools for managing distributed workflows)
Practical Applications
- Use Case: Microsoft’s SRE Academy trains Azure engineers to focus on systemic feedback loops rather than isolated incidents.
- Pitfall: Prematurely blaming human error or a single component in post-incident reviews can mask deeper systemic issues, leading to recurring outages.
References:
Continue reading
Next article
Prevent a page from scrolling while a dialog is open
Related Content
Swift Protocol Magic: Designing a Reusable Location Tracking System for iOS
Eliminate CLLocationManager boilerplate using a protocol-oriented architecture that handles authorization and location updates in five lines of code for production iOS apps.
Why AI Replaces the UI, Not the REST API
An analysis of why AI agents will act as entropy reducers for human input rather than replacing deterministic RESTful APIs.
From Prompting to State Engineering: The Shift Toward Agent Execution Layers
Google I/O 2026 marks a pivot from model capabilities to the emergence of an Agent Execution Layer for persistent AI infrastructure.