Chaos Engineering: Finding Failures Before They Become Outages
Learn the basics of Chaos Engineering: discover the tools, tests, and culture needed to create better software and prevent outages and downtime.
Download the Whitepaper
Thanks for requesting Chaos Engineering: Finding Failures Before They Become Outages! View the whitepaper here. (A copy has also been sent to your email.)
About the Authors
Download the whitepaper to get a primer on Chaos Engineering and learn:
- The fundamentals of Chaos Engineering and how to get started.
- How Amazon and Netflix approach Chaos Engineering.
- Benefits of Chaos Engineering, the culture, and popular tools.
- Practical applications of Chaos Engineering in production.
- Scheduling and automating regular Chaos Experiments.
Incident classification: SEV descriptions and levels, and SEV and time-to-detection (TTD) timelines
Organization-wide critical service monitoring, including key dashboards and KPI metrics emails
Service ownership and metrics for organizations maintaining a microservices architecture
Effective on-call principles for site reliability engineers, including rotation structure, alert threshold maintenance, and escalation practices
Chaos Engineering practices to identify random and unpredictable behavior in your system
Monitoring and metrics to detect incidents caused by self-healing systems
Creating a high-reliability culture by listening to people in your organization
By thoughtfully injecting failure into their systems, engineers can find vulnerabilities and address them before they result in downtime and lost revenue.
This whitepaper provides a comprehensive introduction to the discipline of Chaos Engineering including why it is more needed than ever, how to get started, and best practices to maximize learnings and reduce risk.