Looking back on a day of chaos
Wow. Last month was a whirlwind. We hosted our firstconference, launched a brand newproduct,and raised some fundingto increase product velocity and continue to educate and grow the chaosengineering community.
We heard from many of you about your experience with Chaos Conf and weappreciate the feedback! Having a single-day, one-track conference around afocused topic (in this case Chaos Engineering) featuring world-class speakerswas the best way we could think of to provide maximum value for the community.If you are interested in watching recorded video of the talks, scroll down a bitin this post to find them; you can also follow us on Twitter(@gremlininc) where we will be releasingtranscripts once a week.
Conferences like this don’t happen without a strong community (all of you!) —it was awesome to meet engineers from Europe, Asia, and South America who came out to San Francisco to attend.And they also don’t happen without someone behind the scenes. In our case,that’s Dina. As a one-woman events team operation, she was able to organize atruly community-driven event at an incredible venue (shoutout to the AlamoDrafthouse in San Francisco!) where everyone had fun and was taken care of.
Chaos Conf Speakers
We also have to thank our incredible speakers. Coming from some of the besttechnology companies in the world, we are humbled that speakers took time out oftheir busy schedule to participate in our event.
Adrian Cockroft, VP of Cloud Architecture @ AWS
Adrian kicked off the conference with an incredible opening keynote. With abreadth of experience in topics ranging from distributed systems toobservability to resilience, it was the perfect way to get the day started.
Kriss Rochefolle, Director, Operational Excellence @ Oui.sncf
Kriss gave a charming, funny, straightforward presentation about how to convinceyour organization to adopt Chaos Engineering. We love any talk that starts off“Let’s break things on purpose in production — it’s going to be fun!”
Vilas Veeraraghavan, Director of Engineering @ Walmart Labs
Vilas is an expert practitioner of Chaos Engineering. The results he’s been ableto achieve at JET and Walmart speak for themselves. For big e-commerce websites,avoiding downtime is a top priority!
Ronnie Chen, Engineering Manager @ Twitter
Ronnie gave a fascinating talk about her life as a deep-sea diver. When you areswimming at great depths, it is incredibly important to do proactive failuretesting. And the same is true for your applications — the more critical it is toyour business, the more important it is to identify failures before they impactcustomers.
Mark McBride, CEO & Founder @ Turbine Labs
Chaos is endemic in software engineering — many things are unpredictable. Mark’stalk is full of insight and practical advice for getting out of the loop of: asystem becomes destabilized → everyone drops what they’re doing to firefight →incident is hopefully resolved and things return to square one.
Mikolaj Pawlikowski, Software Engineer @ Bloomberg
Mikolaj gives some great background on why distributed systems are complicatedand how the rise of cloud and microservices have fueled the need for chaosengineering. As Leslie Lamport says, “A distributed system is one in which thefailure of a computer you didn’t even know existed can render your own computerunusable.”
Charity Majors, CEO @ Honeycomb
This was a highly anticipated talk. As the CEO of a product that offers betterobservability into systems, Charity emphasizes a recurring concern: if you can’tobserve it, just don’t do it. But with the proper visibility, proactivelytesting for failure is a recipe for success.
Jessie Frazelle, Software Engineer @ Microsoft
Jessie is a rockstar best known for her work in the Docker ecosystem. If you’reinterested in some of the more technical details about the development ofcontainers, and how breaking them on purpose can lead to wonderful andsurprising insights, then this is the talk for you.
Tammy Butow & Ana Medina, Principal SRE & Chaos Engineer @ Gremlin
In this joint talk, Tammy and Ana talk about their collective experience as SREsconcerned with avoiding downtime and reducing incidents at companies likeDropbox, Uber, DigitalOcean…and now Gremlin.
Kolton Andrus, CEO & Co-Founder @ Gremlin
Kolton’s talk comprised of an overview of Chaos Engineering and the history thatled us to today — from Jessie Robbins pulling out plugs in datacenters, to ChaosMonkey at Netflix randomly shutting down servers, to more refined anddisciplined approaches to Chaos Engineering. To give a glimpse of the future,Kolton announced the availability of Application Level Fault Injection(ALFI)to all Gremlin customers, allowing DevOps teams to be much more targeted withtheir attacks, including on serverless environments
And, of course, we have to thank the incredible Chaos Engineering Community.Whether you were able to make it in person or watched the livestream, it washumbling and inspiring to hear so many of your stories — and gave us plenty ofideas on how to improve our own service.
And if you haven’t already joined the Chaos EngineeringCommunity on slack then please do! No matter ifyou’re just getting started or have decades of experience breaking things, yourvoice would be appreciated. Since we launched in 2017 we’ve felt the love, andhopefully we can make you proud and give it back just as well 💚
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.sTART YOUR TRIAL
What is Failure Flags? Build testable, reliable software—without touching infrastructure
Building provably reliable systems means building testable systems. Testing for failure conditions is the only way to...
Introducing Custom Reliability Test Suites, Scoring and Dashboards
Last year, we released Reliability Management, a combination of pre-built reliability tests and scoring to give you a consistent way to define, test, and measure progress toward reliability standards across your organization.