Incident Repro & Playbook Validation with Chaos Engineering

Learn how you can use Chaos Engineering to reproduce high-severity incidents, ensure your post-incident fixes are working as expected, and validate that your incident management playbooks are up to date.


Watch now

Thank you for registering for this on-demand event. You will receive an email momentarily with a link to watch the session.

About this webinar

In this live session, we will explore how Gremlin can be used to determine whether your system is resilient to specific, high-severity outages. You will learn how you can use Gremlin and FireHydrant together for incident management and incident reproduction.

You’ll also have the opportunity to have your questions answered by our experts during our Q&A segment.

  • First, Tammy and Bobby will introduce an example of a real-world, high-severity incident
  • Then, you will see how you can reproduce the outage conditions using Gremlin
  • Next, we will explore how you can use FireHydrant to improve your incident management program
  • Finally, you will see how Gremlin and FireHydrant can be used together to ensure your systems are resilient to specific types of real-world outages
About the speakers

Tammy Butow

Principal SRE

Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Gremlin helps engineers build resilient systems using their control plane and API. Tammy previously led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers. Prior to this Tammy worked at DigitalOcean and one of Australia's largest banks in Security Engineering, Product Engineering and Infrastructure Engineering.

Robert "Bobby Tables" Ross


Bobby is the co-founder and CEO of FireHydrant.io, an incident response tool. He also previously worked as a staff software engineer at Namely, and built things at DigitalOcean. Bobby has always had an interest in incident response ever since he started maintaining production systems. He likes bleeding edge tech and making software that helps teams build better systems.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.GET STARTED

Product Hero ImageShape