Search documentation
Dashboard
Reliability Management
Health checks

Datadog Health Check

To create a Service, you must add at least one Health Check monitor to your Service definition. Gremlin recommends a combination of 3 to 5 monitors for a well-rounded view of your Service’s health. Once you have selected a Service type, defined its fingerprint, and selected a process, you'll be able to start adding Health Check monitors to your Service.

Adding a Datadog Health Check

The Monitor URL must contain the ID or API endpoint of a Datadog monitor relevant to the Service you are creating in Gremlin. You can get this from the Datadog web app by navigating to the Monitors page, selecting the monitor, and copying the link address. See Get a monitor's details in the Datadog documentation for more information.

To add a Datadog Health Check:

  • Select Datadog from the Integrations drop-down and click Add.

  • If the Datadog integration has been authenticated in Team Settings, you will only see the Monitor URL box — in this case, continue to the next step. If the API key and application key have not been defined for the team, you'll also see the Authentication and Headers section — in this case, you'll need to set up the Datadog team integration first.

    Add Datadog Health Check monitor to Service

  • In the Datadog web app, navigate to the Monitors page and copy the ID or API endpoint of the monitor you want to use.

    Datadog monitor ID

  • Back in the Gremlin web app, append the ID or API endpoint of the monitor to the base Monitor URL.

    • Example of URL with monitor ID: https://app.datadoghq.com/monitors/186872
    • Example of URL with dedicated API endpoint for the monitor: https://api.datadoghq.com/api/v1/monitor/186872
  • Click Test Health Check. Gremlin will invoke the GET details endpoint for that monitor https://api.datadoghq.com/api/v1/monitor/ and validate the JSON response. The evaluation of the Health Check is composed of: (1) an HTTP status code response (status needs to be 200-299 to pass); (2) the time in which the response was received (request timeout has to be under 1000ms); and (3) the evaluation of the overall_state that has to match the "OK" value to pass.

    • If the monitor is OK, the response will look similar to this: Datadog test OK

    • If the monitor is in another state like Alert, Ignored, No Data, Skipped, Unknown, or Warn, you can still save the Health Check, but it will not evaluate to successful once a Reliability Test is run, impacting the Service's Reliability Score.

  • Click Save.

Editing a Health Check Monitor

To edit a Health Check Monitor, go to the specific Service, click Settings, and then click the Health Checks tab.

Edit Datadog Health Check example