Welcome from Kolton

Kolton Andrus, Gremlin

Welcome to the second Chaos Conf. Kolton will explain why we created Chaos Conf and share logistics for the day. Our theme for Chaos Conf 2019 is Break Through.

Keynote: Chaos Engineering for People Systems

Dave Rensin, Google

The rise of highly distributed computing systems based on microservices has made predicting and debugging our products more complex than ever. In response, Chaos Engineering has developed as a way to discover, diagnose, and debug the inevitable emergent properties (and problems) that come with this new reality.

What about our human systems? Can we apply the techniques of chaos engineering to build better teams? Happier employees? More successful companies? Dave thinks so and wants to convince you, too. Come hear him try!

In this keynote, Dave will share his experiences building stronger systems, teams, and companies at Google over the last 5 years.

Forming Failure Hypothesis

Subbu Allamaraju, Expedia

At Expedia, Subbu is leading a large-scale migration of Expediaโ€™s travel platforms from enterprise data centers to a highly available architecture on the cloud. Before joining Expedia, as a Distinguished Engineer at eBay Inc., Subbu helped build private cloud infrastructure and platforms for eBay and PayPal.

Think Big: Chaos Testing a Monolith

Caroline Dickey, Mailchimp

For many companies, Chaos Engineering means testing dependencies between services or killing container instances. Those approaches don't work if your company's main product is a 23 million line PHP monolith running on bare metal. This talk explores chaos testing when best practices aren't an option.

Embracing Chaos!

Paul Osman, Under Armour
Ana Medina, Gremlin

Practical steps for getting started with Chaos Engineering. Using concrete examples we'll cover onboarding teams onto a Chaos Engineering platform, identifying teams that are ready to do GameDays and creating feedback loops to measure resilience.

Lightning Talk: Hot Recipes for Building Chaos Engineering Experiments

Yury Niรฑo Roa, Aval Digital Labs

Drawing a connection between cooking and chaos.

Lightning Talk: Transitive Logic of Systems Fallibility

Niran Fajemisin, Starbucks

The fallibility of humans, systems, and software.

Lightning Talk: Who is Responsible for Chaos?

Joyce Lin, Postman

Who is responsible for chaos and what we can all do as an industry to help prevent it.

Lightning Talk: Resilience Driven Development

Jason Yee, Datadog

Monitor what matters to your customers.

Incident Repro and Playbook Validating with Chaos Engineering

Robert Ross (Bobby Tables), FireHydrant
Tammy Butow, Gremlin

Bobby Tables & Tammy will share how you can use Chaos Engineering to repro high severity incidents and ensure your post-incident fixes are working as expected. Bobby will also share how Chaos Engineering can be used to validate playbooks.

A roadmap towards Chaos Engineering

Jose Esquivel, Backcountry

A common problem with Chaos Experimentation is knowing where to start. In this talk, Principal Software Engineer Jose Esquivel will present a roadmap for Chaos Experimentation that can be applicable to any organization.

Finding the Joy in Chaos Engineering

Lenny Sharpe & Brian Lee, Target

Learn how Target has built a resiliency engineering capability that enables teams to build more resilient systems and how developing a strong culture around Chaos Engineering has paid off. We'll share our journey from experimenting locally to running multi team gamedays.

The future of Chaos Engineering: In Pursuit of the Unknown Unknowns

Crystal Hirschorn, Conde Nast

"Systems fail all the time" goes the popular mantra in Reliability and Resilience engineering fields. Given this premise, industry leading organizations' practices have accelerated and matured several degrees to where we were even a few years ago. Organizations are beginning to stretch beyond their homegrown approaches to building organizational resilience to leveraging the expertise within the industry, and integrating approaches directly into the software deployment lifecycle through commoditized Chaos services.

However, our systems and organizations keep growing in complexity under the ever-increasing pressure for efficiency and scale. Our architectural approaches and paradigms keep shifting to cope with the complexity of domains such as wide adoption of micro services and Serverless development approaches.

A current limiting factor in running Chaos experiments is their contrived nature - we must think ahead what could go wrong. Is this true to experience? What about the sense of surprise that usually pervades failure situations? How can we facilitate more random, generative experiments?

In this talk, Crystal will offer where our Chaos and Resilience practices must evolve to keep pace with the challenges of growing complexity.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Get started