WEBINAR

More Reliability, Less Firefighting

How to Build a Proactive Reliability Program

Does it feel like your team spends all its time putting out incident fires? Change the story with a proactive reliability program that actively improves reliability.

Join reliability expert and engineering leader Jeff Nickoloff for a webinar that lays out the common traits for successful reliability programs so you can build more reliability and spend less time firefighting. You’ll also get a downloadable checklist worksheet to help you create and evaluate your reliability program.

So you can build more reliability and spend less time firefighting.

On-demand

Watch On-Demand

Thank you for registering for this on-demand event. You will receive an email momentarily with a link to watch the session.


About this webinar

Join reliability expert and engineering leader Jeff Nickoloff to learn:

Agenda
  • Why past best practices aren’t enough to improve reliability.
  • How a reliability program reduces toil, burnout, and firefighting.
  • How to build a successful reliability program at your company.
About the speakers

Jeff Nickoloff

Principal Engineer
Gremlin, Inc.

Jeff Nickoloff is a Principal Engineer with the reliability management platform team at Gremlin. Formerly with PayPal and Amazon, Jeff is a seasoned engineering leader, consultant, engineer, entrepreneur, and Kellogg EMBA candidate. He is passionate about building and scaling cross-functional teams through effective communication, and solving high-impact problems.

Gavin Cahill

Sr. Content Manager
Gremlin, Inc.

Gavin Cahill is a writer and content creator specializing in cloud-native development. At companies like New Relic, Docker, Redis, and Apptio, he’s written articles, videos, and books covering observability, containerized development, real-time data, FinOps, and more. His passion is helping people make sense of the complex world of modern software development.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Product Hero ImageShape