Graceful Degradation vs. Fail Fast
Issue #118: System Design Interview Roadmap • Section 5: Reliability & Resilience
When Netflix Keeps Streaming Despite Chaos
Imagine you're binge-watching your favorite series when Netflix's recommendation engine crashes. Do you want the entire app to stop working, or would you prefer it continues playing videos while showing a simple "Recommended for You" placeholder? This scenario perfectly illustrates the fundamental choice every system architect faces: should components fail fast and loudly, or degrade gracefully to maintain core functionality?
What We'll Master Today
Graceful Degradation: Maintaining partial functionality when dependencies fail
Fail-Fast Principles: When immediate failure prevents bigger problems
Decision Framework: Choosing the right strategy for each system component
Implementation Patterns: Circuit breakers, fallbacks, and monitoring strategies
The Tale of Two Philosophies
Graceful Degradation says: "Keep the lights on, even if some features don't work." Your e-commerce site continues processing orders even when the recommendation engine is down—customers can still buy, just without personalized suggestions.
Fail Fast says: "Stop immediately when something's wrong to prevent corruption." Your payment service refuses all transactions the moment it detects database inconsistency—better to block payments than process them incorrectly.