Reliability Management

Golden Signals

How can you measure the reliability of your Service in a meaningful way? The first step is defining your Service and what it looks like in a healthy state. Just like vital signs help a physician understand the state of a person’s health, Golden Signals help Gremlin determine the health of your Service.

Chances are, you're already monitoring several Golden Signals for your Service, in one or more monitoring tools. Most commonly, you will have monitors to measure errors and latency but you may have monitors for additional metrics that are important to your Service.

Gremlin integrates with any monitoring tool, with a URL or custom connection to the monitor API. During a Reliability Test, Gremlin will continuously poll the configured monitors every 10 seconds for alert conditions. If any Golden Signal moves into the alert threshold, Gremlin will halt the test, revert the impact, and report the test as a failure.

To create a Service, you must add the URL from at least one Golden Signal monitor to your Service definition. Gremlin recommends a combination of 3 to 5 monitors for a well-rounded view of your Service’s health.