How to find Kubernetes reliability risks with Gremlin

Office Hours

Register Now

Thank you for registering! Click here to watch the webinar on-demand.

Most Kubernetes clusters have reliability risks lurking just below the surface. You could spend hours or even days manually finding these risks, but what if someone could find them for you?

With Detected Risks, Gremlin automates the work involved in finding and tracking reliability risks across your Kubernetes clusters. Surface failed Pods, mismatched image versions, missing resource definitions, and single points of failure, all without having to run a single test.

Learn how Gremlin uses automatic risk detection to scan your Kubernetes clusters for reliability risks. You’ll also learn where to find your risks in the Gremlin web app, strategies for resolving risks, and how to generate a risk report for leadership.

  • Where to find detected risks in the Gremlin web app
  • Techniques for resolving detected risks before they can cause an incident or outage
  • How to confirm that your fixes address the underlying problem
About the speakers

Andre Newman

Sr. Reliability Specialist

At Gremlin, Andre promotes the benefits of Chaos Engineering and reliability testing to engineering teams around the world, including at some of the largest enterprise organizations. Prior to Gremlin, he created technical content explaining Kubernetes and containerization, the shift to cloud computing, DevOps, observability, and more. His work has been featured in The New Stack, DZone, Software Engineering Daily, TechBeacon, and StatusCode Weekly.

Dan Muret

Sr. Solutions Architect

At Gremlin, Dan works closely with organizations to understand, implement, and design Chaos Engineering and reliability testing practices. Prior to Gremlin, he’s worked as a system administrator and solutions architect for companies like IBM, Zerto, and Veeam/Kasten. Dan’s real-world experience in system architecture, cloud migrations, disaster recovery, and resilience testing help him guide companies to make the most out of their reliability and Chaos Engineering efforts.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.GET STARTED

Product Hero ImageShape