Scenarios

Status Checks

A Status Check checks the state of systems before, during, and after a Scenario. They can also automatically halt Scenarios if systems become unhealthy or unresponsive.

Status Checks hit an endpoint URL to evaluate the status code, the request response time including the JSON response body, and will pass or fail based on your defined criteria.

The endpoint can be from a 3rd party tool such as Datadog, New Relic, PagerDuty or your preferred monitoring tool. It could also be a publicly accessible endpoint for your services's health with or without authentication.

If your 3rd party tools are privately hosted, or you have strict network security policies that prevent you from exposing your systems to the Internet, you can use the Gremlin Integration Agent. The Gremlin Integration Agent allows you to integrate Gremlin with your internal tools without exposing your internal endpoints to the public Internet.

Creating a Status Check

A Status Check can be created within the Scenario workflow or in the Status Check library.

Configuration

Saving a Status Check in your Team's library allows you to reuse Status Checks in a Scenario or start with a template to customize. A saved Status Check will store the Configuration and Success Evaluation including Header information.

In the Status Checks library select the "New Status Check" button to bring up the creation form. Follow the steps below to configure and save your Status Check. This Status Check will be available for your entire team to import into the Scenario workflow.

You can duplicate or delete a Status Check within the overflow menu of each saved Status Check. Duplicating allows for ease of creating multiple Status Checks with different Success Evaluation parameters. Deleting a Status Check will permanently remove it from the library but each Scenario that is using it will get a copy of the configuration and not be impacted.

Continuous Status Check

Use the toggle button to allow the Status Check to run continuously during the Scenario (polling every 10 seconds). A Continuous Status Check evaluates the success criteria to help validate how your system handles the failure injected during the attack. If the evaluation fails the Scenario will halt and record the last response result and Scenario step that was interrupted.

Name, description, endpoint URL

Create a Status Check by entering a name, description, and endpoint URL. For a description, it’s helpful to include what services you're testing or what you’re expecting to happen.

Endpoint URL

The endpoint URL is the endpoint that the Status Check will hit and whose response will get evaluated in order to determine success or failure of the Scenario.

Public Network Endpoints

Use the drop down menu on the Status Check form to select Datadog, New Relic, or PagerDuty to pre-populate the form to easily start a Status Check or use the Custom option to build your own.

Private Network Endpoints

If you want to use a system for status checks that is not publicly accessible over the Internet then use a Private Network Status Check by selecting the check box to switch to internal. When you select the Private Network check box you will only be able to create Custom Status Checks. Note: You must have the Integration Agent installed in order to use Private Network Endpoints.

Header information

Add the headers needed to authenticate the request, specific to the 3rd party the Status Check is communicating with. For example, to add a Status Check for a DataDog endpoint, you will need two headers

  • Your organization’s API key
  • Your application key

For more information on authentication, please see information specific to DataDog, New Relic, PagerDuty, or the third-party monitoring solution that is used on your system.

Once you have added the above fields, use the “Test Request” button to ensure you have successfully authenticated your request. A successful response will include a 200-204 OK HTTP status code, the time it took to respond, and the Request Response Body. An unsuccessful or unauthorized response will respond with a 4XX or 5XX status code.

Header content information can also be added to evaluate content type or specify an API version in the endpoint URL.

Success evaluation

Provide success criteria that your Status Check will evaluate the response against to keep the Scenario running.

Healthy status code

Add the status code the response should include if the service is healthy. If the status code responds outside of this code or range of codes then the Scenario will automatically halt. See the list of HTTP Status Codes for more guidance. Besides a single HTTP Status Code, you can also enter in a range such as 200-204.

Request timeout

For the Request Timeout, add the maximum time in milliseconds to wait for a response before halting the Scenario. For example, you might add a Status Check before starting a latency attack to validate your service is responding within your Service Level Indicator (SLI) and Service Level Objectives (SLO) requirements. This would ensure that a Scenario halts prior to introducing even more latency on your service.

Healthy response body criteria

Add the key that you expect from the response body, and then add a comparator to ensure the value associated with that key is accurate. If the value doesn’t pass the comparator you add, the Scenario will halt. This field is especially important for evaluating the responses from 3rd party monitoring software. At this time, we support JSON response bodies. This was implemented using the Jayway JsonPath library. Please refer to their docs for options for evaluating response body criteria as well as the basic Operators and Functions tables below.

Operators
OperatorDescription
$The root element to query. This starts all path expressions.
@The current node being processed by a filter predicate.
*Wildcard. Available anywhere a name or numeric are required.
..Deep scan. Available anywhere a name is required.
.<name>Dot-notated child
['<name>' (, '<name>')]Bracket-notated child or children
[<number> (, <number>)]Array index or indexes
[start:end]Array slice operator
[?(<expression>)]Filter expression. Expression must evaluate to a boolean value.

Tables are from the Jayway JSONpath library

Functions

Functions can be invoked at the tail end of a path - the input to a function is the output of the path expression. The function output is dictated by the function itself.

FunctionDescriptionOutput
min()Provides the min value of an array of numbersDouble
max()Provides the max value of an array of numbersDouble
avg()Provides the average value of an array of numbersDouble
stddev()Provides the standard deviation value of an array of numbersDouble
length()Provides the length of an arrayInteger
sum()Provides the sum value of an array of numbersDouble

Tables are from the Jayway JSONpath library

Once you’ve added the above fields, use the “Test Evaluation” button to ensure that you’ve successfully set up the Status Check criteria. A successful response will confirm your success criteria and enable the “Add to Scenario” button. If your endpoint URL responds with failed criteria you will still be able to add the Status Check to the scenario since your service could be unhealthy at that point in time.

Status Checks IP ranges

If your firewall is blocking the utilization of the Status Check feature you will need to add the following IP address to your allow list.

144.236.227.116