Incident repro & playbook validation for SREs
Tammy Butow (Principal SRE @ Gremlin) and Robert Ross (CEO @ Firehydrant) discuss how SREs can being proactive with Chaos Engineering
SRE Best Practices for Incident Management
Learn about the rise of Site Reliability Engineering, and how the role of this type of incident management can not only coexist with, but also strengthen a DevOps approach to development.
- The SRE reliability hierarchy
- SREs and Chaos Engineering
The SRE reliability hierarchy
SRE's primary job is making and keeping a service and an application reliable, and this involves a lot of moving pieces! The following graph shows the Service Reliability Hierarchy, according to Google. Scroll over each layer to see how Chaos Engineering can help.