Intelligent Health Checks - Failure Flags
Supported platforms:
Services created using Failure Flags by proxy support Intelligent Health Checks. When enabled, Gremlin creates a set of Health Checks that monitors key metrics (traffic, errors, and latency), and sets error thresholds using the service’s baseline performance. This lets you monitor your Failure Flags services during testing without first having to deploy an observability tool or configure alerts.
How it works
Failure Flags Intelligent Health Checks work using the ingress proxy mode. When this mode is enabled for an application, the Failure Flags sidecar acts as a proxy between inbound network traffic and your application. With Failure Flags Intelligent Health Checks enabled, the sidecar captures the following metrics:
- Traffic: how many requests pass through the proxy.
- Latency: the average round trip time for requests.
- Errors: how many requests fail or return an HTTP 5XX error code.
The sidecar collects this metric data every one (1) minute, then aggregates and sends it to the Gremlin backend every five (5) minutes. This interval is temporarily lowered to one minute when a reliability test is running if the test includes a Failure Flags Intelligent Health Check, or if it has a Failure Flags target.
Enabling Failure Flags Intelligent Health Checks
To enable Intelligent Health Checks for a Failure Flags application:
- Navigate to the Failure Flags service you want to enable Intelligent Health Checks for. This service must have the ingress proxy enabled. Applications using the Failure Flags SDK are not supported.
- Select Settings > Health Checks.
- Under Intelligent Health Checks, click on the Type box, select Failure Flag Proxies, and click + Add.
- Select the application with the ingress proxy that you want to monitor by selecting the checkbox next to its name.
- You can select multiple (up to five (5)) applications to track per service. For each application, Gremlin will create Health Checks for each of the three metrics listed above: traffic, errors, and latency.
- Click Submit.
Excluding URL paths from metrics
By default, the sidecar will collect metrics for all requests. If you want to prevent metrics collection for certain requests, you can provide a list of URL paths to the sidecar. This is useful for ignoring health check, liveness check, and similar paths not meant for application traffic.
To exclude these paths, add the GREMLIN_METRICS_EXCLUDE_HEALTH_CHECK_PATHS environment variable (or the metrics_exclude_health_check_paths property to your configuration file) with a comma-separated list of paths to your sidecar deployment. For example:
Disabling Failure Flags Intelligent Health Checks
To disable Intelligent Health Checks for a Failure Flags application:
- Navigate to the Failure Flags service you want to disable Intelligent Health Checks for.
- Select Settings > Health Checks.
- Under Intelligent Health Checks, click the Edit button next to one of the Intelligent Health Checks you want to disable.
- Uncheck the checkbox next to the application that you no longer want to monitor.
- Click Submit.
Opting out of Failure Flags Health Checks and metrics collection
If you don’t want the Failure Flags sidecar to collect metrics, set the GREMLIN_METRICS_OPT_OUT environment variable (or metrics_opt_out in your configuration file) to true. If this variable is true, metrics will not be collected and Intelligent Health Checks will not be available. This variable is false by default.

