Reliability Management

Reliability Score

The Reliability Score helps set a standard view of reliability across all teams and services in your organization. Once you define your Service and run Gremlin's pre-defined Reliability tests across common reliability risks, Gremlin will calculate the Reliability Score for your Service.

How the Reliability Score is calculated

A Reliability Score is a calculated value between 0 and 100 representing a Service's reliability. The initial Reliability Score is 0; as you run Reliability tests on your Service, you can potentially increase its score. Even a failed test counts positively because there is more value in discovering a failure through testing than not testing at all. Existing test scores degrade 25% every 30 days if they are not run regularly. To achieve the highest possible score, run the Reliability tests every week without any Golden Signal failures. So the key is to start testing and keep testing on a consistent schedule!

Test categories

Reliability tests are grouped into the following categories:

  • Redundancy
  • Scalability
  • Dependency

The individual test scores from each category are added up and averaged to provide the total Reliability Score.

Individual tests are scored as follows:

  • 0 pts for a test that has not been run
  • 50 points for failing a test
  • 100 points for passing a test

Gremlin tracks Reliability Scores over time so you can see how it improves as you continue to test. Keep in mind that weekly testing affects the score positively and will reveal the areas your team may need to address.