Platform > Intelligent Health Checks - Failure Flags

Intelligent Health Checks - Failure Flags

Supported platforms:

N/A

Services created using Failure Flags by proxy support Intelligent Health Checks. When enabled, Gremlin creates a set of Health Checks that monitors key metrics (traffic, errors, and latency), and sets error thresholds using the service’s baseline performance. This lets you monitor your Failure Flags services during testing without first having to deploy an observability tool or configure alerts.

‍

How it works

Failure Flags Intelligent Health Checks work using the ingress proxy mode. When this mode is enabled for an application, the Failure Flags sidecar acts as a proxy between inbound network traffic and your application. With Failure Flags Intelligent Health Checks enabled, the sidecar captures the following metrics:

Traffic: how many requests pass through the proxy.
Latency: the average round trip time for requests.
Errors: how many requests fail or return an HTTP 5XX error code.

The sidecar collects this metric data every one (1) minute, then aggregates and sends it to the Gremlin backend every five (5) minutes. This interval is temporarily lowered to one minute when a reliability test is running if the test includes a Failure Flags Intelligent Health Check, or if it has a Failure Flags target.

Note

Gremlin only collects calculated metric data. We do not collect request details such as URLs, paths, request types, headers, or body contents. For details on our data collection policy, see our security page.

‍

Enabling Failure Flags Intelligent Health Checks

To enable Intelligent Health Checks for a Failure Flags application:

Navigate to the Failure Flags service you want to enable Intelligent Health Checks for. This service must have the ingress proxy enabled. Applications using the Failure Flags SDK are not supported.
Select Settings > Health Checks.
Under Intelligent Health Checks, click on the Type box, select Failure Flag Proxies, and click + Add.
Select the application with the ingress proxy that you want to monitor by selecting the checkbox next to its name.
1. You can select multiple (up to five (5)) applications to track per service. For each application, Gremlin will create Health Checks for each of the three metrics listed above: traffic, errors, and latency.
Click Submit.

‍

Excluding URL paths from metrics

By default, the sidecar will collect metrics for all requests. If you want to prevent metrics collection for certain requests, you can provide a list of URL paths to the sidecar. This is useful for ignoring health check, liveness check, and similar paths not meant for application traffic.

To exclude these paths, add the GREMLIN_METRICS_EXCLUDE_HEALTH_CHECK_PATHS environment variable (or the metrics_exclude_health_check_paths property to your configuration file) with a comma-separated list of paths to your sidecar deployment. For example:

SHELL


GREMLIN_METRICS_EXCLUDE_HEALTH_CHECK_PATHS = "/health,/status,/v1/healthchecks"

‍

Disabling Failure Flags Intelligent Health Checks

To disable Intelligent Health Checks for a Failure Flags application:

Navigate to the Failure Flags service you want to disable Intelligent Health Checks for.
Select Settings > Health Checks.
Under Intelligent Health Checks, click the Edit button next to one of the Intelligent Health Checks you want to disable.
Uncheck the checkbox next to the application that you no longer want to monitor.
Click Submit.

Note

If you disable Intelligent Health Checks, make sure the service has at least one other Health Check enabled. As a safety precaution, Gremlin will not run tests on services that have no Health Checks.

‍

Opting out of Failure Flags Health Checks and metrics collection

If you don’t want the Failure Flags sidecar to collect metrics, set the GREMLIN_METRICS_OPT_OUT environment variable (or metrics_opt_out in your configuration file) to true. If this variable is true, metrics will not be collected and Intelligent Health Checks will not be available. This variable is false by default.

Health Checks

Restricting Testing Times