A Golden Signal checks the state of systems before, during, and after a Scenario. They can also automatically halt Scenarios if systems become unhealthy or unresponsive.
Golden Signals hit an endpoint URL to evaluate the status code, the request response time including the JSON response body, and will pass or fail based on your defined criteria.
The endpoint can be from a 3rd party tool such as Datadog, New Relic, PagerDuty or your preferred monitoring tool. It could also be a publicly accessible endpoint for your services's health with or without authentication.
If your 3rd party tools are privately hosted, or you have strict network security policies that prevent you from exposing your systems to the Internet, you can use the Gremlin Integration Agent. The Gremlin Integration Agent allows you to integrate Gremlin with your internal tools without exposing your internal endpoints to the public Internet.
A Golden Signal can be created within the Scenario workflow or in the Golden Signal library.
Saving a Golden Signal in your Team's library allows you to reuse Golden Signals in a Scenario or start with a template to customize. A saved Golden Signal will store the Configuration and Success Evaluation including Header information.
In the Golden Signals library select the "New Golden Signal" button to bring up the creation form. Follow the steps below to configure and save your Golden Signal. This Golden Signal will be available for your entire team to import into the Scenario workflow.
You can duplicate or delete a Golden Signal within the overflow menu of each saved Golden Signal. Duplicating allows for ease of creating multiple Golden Signals with different Success Evaluation parameters. Deleting a Golden Signal will permanently remove it from the library but each Scenario that is using it will get a copy of the configuration and not be impacted.
Use the toggle button to allow the Golden Signal to run continuously during the Scenario (polling every 10 seconds). A Continuous Golden Signal evaluates the success criteria to help validate how your system handles the failure injected during the attack. If the evaluation fails the Scenario will halt and record the last response result and Scenario step that was interrupted.
The endpoint URL is the URL that Gremlin will check to get the current status of the Golden Signal. In turn, this is used to determine the impact that the Scenario had on your service.
By default, Gremlin assumes that the endpoint URL for your Golden Signal is accessible over the public Internet. If your endpoint is hidden (i.e. behind a firewall), you can use a Private Network Endpoint instead. Private Network Endpoints let you monitor Golden Signals from within a private network. You can enable this option by toggling the check box. Note: You must have the Integration Agent installed in order to use Private Network Endpoints.
Add the headers needed to authenticate the request, specific to the 3rd party the Golden Signal is communicating with. For example, to add a Golden Signal for a DataDog endpoint, you will need two headers
- Your organization’s API key
- Your application key
For more information on adding headers for the different integration types, see this section of the Gremlin Reliability Management docs.
Once you have added the necessary fields, use the “Test Request” button to ensure you have successfully authenticated your request. A successful response will include a 200-204 OK HTTP status code, the time it took to respond, and the Request Response Body. An unsuccessful or unauthorized response will respond with a 4XX or 5XX status code.
Provide success criteria that your Golden Signal will evaluate the response against to keep the Scenario running.
Add the status code the response should include if the service is healthy. If the status code responds outside of this code or range of codes then the Scenario will automatically halt. See the list of HTTP Status Codes for more guidance. Besides a single HTTP Status Code, you can also enter in a range such as 200-204.
For the Request Timeout, add the maximum time in milliseconds to wait for a response before halting the Scenario. For example, you might add a Golden Signal before starting a latency attack to validate your service is responding within your Service Level Indicator (SLI) and Service Level Objectives (SLO) requirements. This would ensure that a Scenario halts prior to introducing even more latency on your service.
Add the key that you expect from the response body, and then add a comparator to ensure the value associated with that key is accurate. If the value doesn’t pass the comparator you add, the Scenario will halt. This field is especially important for evaluating the responses from 3rd party monitoring software. At this time, we support JSON response bodies. This was implemented using the Jayway JsonPath library. Please refer to their docs for options for evaluating response body criteria as well as the basic Operators and Functions tables below.
|The root element to query. This starts all path expressions.|
|The current node being processed by a filter predicate.|
|Wildcard. Available anywhere a name or numeric are required.|
|Deep scan. Available anywhere a name is required.|
|Bracket-notated child or children|
|Array index or indexes|
|Array slice operator|
|Filter expression. Expression must evaluate to a boolean value.|
Tables are from the Jayway JSONpath library
Functions can be invoked at the tail end of a path - the input to a function is the output of the path expression. The function output is dictated by the function itself.
|Provides the min value of an array of numbers||Double|
|Provides the max value of an array of numbers||Double|
|Provides the average value of an array of numbers||Double|
|Provides the standard deviation value of an array of numbers||Double|
|Provides the length of an array||Integer|
|Provides the sum value of an array of numbers||Double|
Tables are from the Jayway JSONpath library
Once you’ve added the above fields, use the “Test Evaluation” button to ensure that you’ve successfully set up the Golden Signal criteria. A successful response will confirm your success criteria and enable the “Add to Scenario” button. If your endpoint URL responds with failed criteria you will still be able to add the Golden Signal to the scenario since your service could be unhealthy at that point in time.
If your firewall is blocking the utilization of the Golden Signal feature you will need to add the following IP address to your allow list.