October 16, 2018

Looking back on a day of chaos

Wow. Last month was a whirlwind. We hosted our first conference, launched a brand new product, and raised some funding to increase product velocity and continue to educate and grow the chaos engineering community.

We heard from many of you about your experience with Chaos Conf and we appreciate the feedback! Having a single-day, one-track conference around a focused topic (in this case chaos engineering) featuring world-class speakers was the best way we could think of to provide maximum value for the community. If you are interested in watching recorded video of the talks, scroll down a bit in this post to find them; you can also follow us on Twitter (@gremlininc) where we will be releasing transcripts once a week.

Conferences like this donโ€™t happen without a strong community (all of you!) โ€” it was awesome to meet engineers from Europe, Asia, and South America who came out to San Francisco to attend. And they also donโ€™t happen without someone behind the scenes. In our case, thatโ€™s Dina. As a one-woman events team operation, she was able to organize a truly community-driven event at an incredible venue (shoutout to the Alamo Drafthouse in San Francisco!) where everyone had fun and was taken care of.

1 XrdA1YOMl 5tXW0 PSr4TA
Left: Dina, Head of Events | Right: Kolton, Co-Founder and CEO

1 Fu8wjoOjU0pyCbFoZDcAYg
Making sure everything is ready at The Alamo Drafthouse

1 PPPCzAHgutk9qPOb pGB1g
The Gremlin Team in Chaos Conf swag and lab coats for running experiments ๐Ÿ‘ฉโ€โš•๏ธ

Chaos Conf Speakers

We also have to thank our incredible speakers. Coming from some of the best technology companies in the world, we are humbled that speakers took time out of their busy schedule to participate in our event.

Adrian Cockroft, VP of Cloud Architecture @ AWS

Adrian kicked off the conference with an incredible opening keynote. With a breadth of experience in topics ranging from distributed systems to observability to resilience, it was the perfect way to get the day started.

Kriss Rochefolle, Director, Operational Excellence @ Oui.sncf

Kriss gave a charming, funny, straightforward presentation about how to convince your organization to adopt chaos engineering. We love any talk that starts off โ€œLetโ€™s break things on purpose in production โ€” itโ€™s going to be fun!โ€

Vilas Veeraraghavan, Director of Engineering @ Walmart Labs

Vilas is an expert practitioner of chaos engineering. The results heโ€™s been able to achieve at JET and Walmart speak for themselves. For big e-commerce websites, avoiding downtime is a top priority!

Ronnie Chen, Engineering Manager @ Twitter

Ronnie gave a fascinating talk about her life as a deep-sea diver. When you are swimming at great depths, it is incredibly important to do proactive failure testing. And the same is true for your applications โ€” the more critical it is to your business, the more important it is to identify failures before they impact customers.

Mark McBride, CEO & Founder @ Turbine Labs

Chaos is endemic in software engineering โ€” many things are unpredictable. Markโ€™s talk is full of insight and practical advice for getting out of the loop of: a system becomes destabilized โ†’ everyone drops what theyโ€™re doing to firefight โ†’ incident is hopefully resolved and things return to square one.

Mikolaj Pawlikowski, Software Engineer @ Bloomberg

Mikolaj gives some great background on why distributed systems are complicated and how the rise of cloud and microservices have fueled the need for chaos engineering. As Leslie Lamport says, โ€œA distributed system is one in which the failure of a computer you didnโ€™t even know existed can render your own computer unusable.โ€

Charity Majors, CEO @ Honeycomb

This was a highly anticipated talk. As the CEO of a product that offers better observability into systems, Charity emphasizes a recurring concern: if you canโ€™t observe it, just donโ€™t do it. But with the proper visibility, proactively testing for failure is a recipe for success.

Jessie Frazelle, Software Engineer @ Microsoft

Jessie is a rockstar best known for her work in the Docker ecosystem. If youโ€™re interested in some of the more technical details about the development of containers, and how breaking them on purpose can lead to wonderful and surprising insights, then this is the talk for you.

Tammy Butow & Ana Medina, Principal SRE & Chaos Engineer @ Gremlin

In this joint talk, Tammy and Ana talk about their collective experience as SREs concerned with avoiding downtime and reducing incidents at companies like Dropbox, Uber, DigitalOceanโ€ฆand now Gremlin.

Kolton Andrus, CEO & Co-Founder @ Gremlin

Koltonโ€™s talk comprised of an overview of chaos engineering and the history that led us to today โ€” from Jessie Robbins pulling out plugs in datacenters, to Chaos Monkey at Netflix randomly shutting down servers, to more refined and disciplined approaches to chaos engineering. To give a glimpse of the future, Kolton announced the availability of Application Level Fault Injection (ALFI) to all Gremlin customers, allowing DevOps teams to be much more targeted with their attacks, including on serverless environments

The Community

And, of course, we have to thank the incredible Chaos Engineering Community. Whether you were able to make it in person or watched the livestream, it was humbling and inspiring to hear so many of your stories โ€” and gave us plenty of ideas on how to improve our own service.

And if you havenโ€™t already joined the Chaos Engineering Community on slack then please do! No matter if youโ€™re just getting started or have decades of experience breaking things, your voice would be appreciated. Since we launched in 2017 weโ€™ve felt the love, and hopefully we can make you proud and give it back just as well ๐Ÿ’š

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Try Gremlin for free and see how you can harness chaos to build resilient systems.