Measure and track your reliability

Get a comprehensive, objective measurement of your service’s reliability in minutes.

Free for 30 days. No credit card required.

Hundreds of finance, retail, and technology organizations worldwide trust Gremlin

This is some text inside of a div block.

The world’s first truly objective measure of reliability

Scores are used everywhere in software engineering—QA tests, unit test coverage, uptime, etc. Why not reliability? Gremlin is the only reliability solution that can give you an objective, up-to-date score on how reliable your services are: no configuration needed.

A donut chart showing 82/100 with three categories comprising it: Redundancy, Scalability, and Dependencies

What is a reliability score?

A reliability score is a value that represents how well a service can withstand real-world failures. Gremlin runs a suite of reliability tests on your services, then calculates the score based on the percentage of successful tests. This score ranges from 0 to 100, with 100 indicating a reliable service.

Gremlin makes Chaos Engineering easy and seamless. For us, it’s cut down the amount of time involved in designing and executing the chaos experiments, particularly for our Microservices and Kubernetes.

Chaitanya Krant, Engineering Manager at National Australia Bank

Align your engineering organization around a single reliability metric

Engineering teams have varying ideas of what reliability means and how to measure it. Gremlin’s reliability score sets the standard for teams, letting you see how well each team adheres to your organization’s reliability standards. Teams now have a positive, proactive, and self-guiding reliability metric they can use to plan improvements. Contrast this with ‌retrospective meetings teams run after an incident has already happened.

At its core, Gremlin’s reliability score is built on your observability tool. In other words, you tell us what “reliable” looks like. Gremlin will use your metric of choice—whether it’s a simple responsiveness check, a Datadog metric, a PagerDuty alert, or something more complex.

Track changes to reliability over time

Your reliability score is more than just a point-in-time measure of reliability. Gremlin also tracks your score over time so you can see how the reliability of your service has changed as you continue to test and improve it. This is especially useful for reviewing past test results, determining when you last tested this service, and proving to your manager that you've been putting effort into improving your service's reliability.

Also, services change over time. Engineers push new code, services scale up and down, and infrastructure changes. New risks may appear and regressions may be reintroduced. Reliability scoring lets teams prove that you’ve kept your services resilient to new and recurring risks. And if a reliability risk does appear, the score acts as a proactive indicator, so you can fix it before it ever reaches production.

Demonstrate improvements to your organization

Historically, teams have struggled to prove the impact of their reliability efforts. Indicators like mean time to detection (MTTD) and mean time to resolution (MTTR) are useful, but they’re reactive, and don’t tell the whole story. By the time you collect data on these indicators, the incident or outage has already happened.

With reliability scores, teams now have a proactive metric they can use to show the positive impact of their reliability work.

Shift from observing to improving

Gremlin enables teams to proactively improve reliability at every stage of maturity.

Custom Chaos Tests & Experiments

Robust, customizable chaos tests to safely replicate any incident scenario.

Standardized Reliability Tests

Pre-built test suite to cover the most common reliability risks. Get started in minutes.

Automated & Scaled Reliability Programs

Standardized scoring tools to identify and prioritize risks, and build reliability programs.

Get a demo