This is Fine: The SRE's Guide to Chaos & Observability

Today’s distributed, cloud-based environments are incredibly complex. Not only does each component depend on many others, but modern systems are also highly dynamic—changing frequently as teams push new code or make updates to infrastructure.

Taming this complexity to ensure reliability requires end-to-end observability to understand how components depend on each other. Additionally, proactive Chaos Engineering combined with AI-driven observability lets you uncover “unknown unknowns” that impact how your system will respond to different failure scenarios.


Register Now

Thank you for registering for this on-demand event. You will receive an email momentarily with a link to watch the session.

About this webinar

Join Gremlin and Dynatrace as we discuss techniques for maintaining and improving reliability in complex cloud environments. We will cover how to establish end-to-end observability across your environments and how to map their complex relationships. We will then provide a framework for safely and thoughtfully conducting Chaos Engineering experiments with Gremlin.

Finally, we will share how teams can incorporate continuous chaos experimentation into build and deploy pipelines using the concept of “quality gates” in Dynatrace to help you establish and adhere to reliability SLOs.

  • Learn the history, principles and practice of Chaos Engineering
  • Discover how to improve your teams on-call skills
  • How observability and chaos work together to improve the reliability of distributed systems
  • How to use Gremlin and Dynatrace to enable your engineering team to have continuous improvement
About the speakers

Ana M Medina

Sr. Chaos Engineer

Ana Margarita is a Senior Chaos Engineer at Gremlin and helps companies avoid outages by running proactive chaos engineering experiments. Before Gremlin, she worked at various-sized companies including Google, Uber, SFEFCU, and Miami-based startups. Ana is an internationally recognized speaker and has presented at: AWS re:Invent, KubeCon, DockerCon, DevOpDays, AllDayDevOps, Write/Speak/Code, and many others. Catch her tweeting at @Ana_M_Medina about traveling, diversity in tech, and mental health.

Andreas Grabner

DevOps Activist

Andreas Grabner (@grabnerandi) has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on blog.dynatrace.com. In his spare time you can most likely find him on one of the salsa dance floors of the world!

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.GET STARTED

Product Hero ImageShape