Navigating the Reliability Minefield
Finding and Fixing Your Hidden Reliability Risks
Reliability risks lurk everywhere in complex cloud architectures, especially when they’re scaled to an enterprise level. But how do you know where all of those risks are? How can you distinguish between minor incident risks with little impact and probability versus risks that are likely to become catastrophic outages? And more importantly, how do you drive conversations with the broader engineering organization about which risks need mitigation now?
Designed by reliability and chaos engineering experts, Gremlin’s Reliability Tracker gives you a working reliability map of the services and most likely failure scenarios in your organization.
Thank you for registering for this on-demand event. You will receive an email momentarily with a link to watch the session.
About this webinar
By combining this spreadsheet and methodology with reliability testing, you’ll be able to test your systems, find reliability risks, and know what will happen if they fail—then prioritize your engineering efforts to stop disruptive outages before they happen.
In this webinar, Sam Rossoff, Principal Engineer and one of the creators of Gremlin’s Reliability Tracker, will use it to walk you through performing a Reliability Risk Assessment, and show you how to:
- Test your systems
- Find reliability risks
- Know what will happen if they fail
- Prioritize your engineering efforts to stop disruptive outages before they happen
Proactively improve reliability
Explore our tutorials to learn about the technologies and processes that help you manage reliability to a higher standard