Datadog Health Check
To create a Service, you must add at least one Health Check monitor to your Service definition. Gremlin recommends a combination of 3 to 5 monitors for a well-rounded view of your Service’s health. Once you have selected a Service type, defined its fingerprint, and selected a process, you'll be able to start adding Health Check monitors to your Service.
Before adding a Datadog Health Check monitor to your Service, first authenticate the Health Check monitoring tool at the team level. See Datadog Team Integration for instructions.
Adding a Datadog Health Check
The Monitor URL must contain the ID or API endpoint of a Datadog monitor relevant to the Service you are creating in Gremlin. You can get this from the Datadog web app by navigating to the Monitors page, selecting the monitor, and copying the link address. See Get a monitor's details in the Datadog documentation for more information.
To add a Datadog Health Check:
Select Datadog from the Integrations drop-down and click Add.
If the Datadog integration has been authenticated in Team Settings, you will only see the Monitor URL box — in this case, continue to the next step. If the API key and application key have not been defined for the team, you'll also see the Authentication and Headers section — in this case, you'll need to set up the Datadog team integration first.
In the Datadog web app, navigate to the Monitors page and copy the ID or API endpoint of the monitor you want to use.
Back in the Gremlin web app, append the ID or API endpoint of the monitor to the base Monitor URL.
- Example of URL with monitor ID:
https://app.datadoghq.com/monitors/186872
- Example of URL with dedicated API endpoint for the monitor:
https://api.datadoghq.com/api/v1/monitor/186872
- Example of URL with monitor ID:
Click Test Health Check. Gremlin will invoke the GET details endpoint for that monitor
https://api.datadoghq.com/api/v1/monitor/
and validate the JSON response. The evaluation of the Health Check is composed of: (1) an HTTP status code response (status needs to be 200-299 to pass); (2) the time in which the response was received (request timeout has to be under 1000ms); and (3) the evaluation of theoverall_state
that has to match the "OK" value to pass.If the monitor is OK, the response will look similar to this:
If the monitor is in another state like
Alert
,Ignored
,No Data
,Skipped
,Unknown
, orWarn
, you can still save the Health Check, but it will not evaluate to successful once a Reliability Test is run, impacting the Service's Reliability Score.
Click Save.
Editing a Health Check Monitor
To edit a Health Check Monitor, go to the specific Service, click Settings, and then click the Health Checks tab.