Reliability Management

Test Suites

A Test Suite is a group of reliability tests that get applied to each service in a Gremlin team. Test Suites let you customize how each of your teams evaluates their reliability scores. For example, you can assign unique Test Suites to one or more teams, or use a single Test Suite across your entire organization.

Test Suites are built on Scenarios. Each Scenario appears in the suite as a single test. You can also group Scenarios into categories, such as "Redundancy", "Scalability", and "Depdendencies." Gremlin also provides a default Test Suite called Gremlin Recommended Tests, which uses our built-in Scenarios to provide a balanced collection of test.

You must be a Company Manager, Company Admin, or Company Owner to create or edit Test Suites. See Role-based access controls for details.

Gremlin Recommended Tests

Gremlin provides a default Test Suite called Gremlin Recommended Tests, which uses our built-in Scenarios to provide a balanced collection of test. This suite includes the following tests, organized by category:


  • ^CPU: Tests that your service scales as expected when CPU capacity is limited. Gremlin will consume CPU in 3 stages (50%, 75%, 90%). Estimated test length: 15 minutes.
  • ^Memory: Tests that your service scales as expected when memory is limited. Gremlin will increase the memory utilization of your system in 3 stages (50%, 75%, 90%). Estimated test length: 15 minutes.


  • ^Host: Tests resilience to host failures by immediately shutting down a randomly selected host or container. Estimated test length: 5 minutes.
  • ^Zone: Tests your service's availability when a randomly selected zone is unreachable from the other zones. The Gremlin zone tag is required for this test. Estimated test length: 5 minutes.


  • ^Failure Test: Drops all network traffic to a specific dependency. Estimated test length: 5 minutes.
  • ^Latency Test: Delays all network traffic to this dependency by 100ms. Estimated test length: 5 minutes.
  • ^Certificate Expiry Test: Opens a secure connection to your dependency, retrieves the certificate chain, and validates that no certificates expire in the next 30 days. If there is no secure connection available, and therefore no certificates, this test will pass. Estimated test length: 1 minute.

Creating and editing a Test Suite

Before creating a new Test Suite, you'll first need to create the Scenarios that you want to add to the Test Suite. Follow the instructions in Creating a Scenario if you haven't yet done that.

Once you have your Scenarios created, follow these steps:

  1. Open the Test Suites page in the Gremlin web app.
  2. Select + Test Suite.
  3. ^Alternatively, you can click Clone next to an existing suite to create a copy. If you don't want to create an entire suite from scratch, cloning the Gremlin Recommended Tests is a great way to get started!
  4. Enter a Name for the Test Suite.
  5. Under Test Suite Composition, use the Search Scenarios box to find the Scenario you want to add to the suite, then click + Add. The Scenario will appear in the list below the search box.
  6. ^You can remove a Scenario from the suite by clicking the Delete button. This won't delete the Scenario; it will simply remove it from the list.
  7. ^Optionally, click the Edit button to change how the Scenario appears in the suite. You can change its name, add a description, and choose a category for the Scenario. Note that changing these settings won't change the Scenario itself, only how the Scenario appears in the Test Suite. You can safely make changes here without affecting the base Scenario.
  8. ^Click Save to save any edits you made to the Scenario.
  9. Under Teams, use the Search Teams box to find the Gremlin team that you want to assign this Test Suite to. You can add multiple teams at once.
  10. ^Note: each team can only have one Test Suite. Changing a team's Test Suite will also reset the test results and scores to zero, however, the data for the previous Test Suite is preserved and will reappear if you change back to the original Test Suite later.
  11. ^You can remove a team by clicking Remove next to the team name.
  12. Check the Test Preview section to confirm that this is the correct Scenario, then click Save to save the Test Suite.

Deleting a Test Suite

Before deleting a Test Suite, make sure to reassign any teams that are using the Test Suite to a different one!

To delete a Test Suite, open the Test Suites page, find the suite you want to delete, then click the Delete button. Confirm, and the Test Suite will be deleted. This action will not modify any of the Scenarios that were part of the Test Suite.

Including/excluding Detected Risks in Test Suites

You can include and/or exclude individual Detected Risks in your custom Test Suites. Including a Detected Risk in a Test Suite factors that risk into the reliability score calculations for that Test Suite. By default, all current Detected Risks and any newly-added risks will be included in Test Suites automatically.

When a Test Suite has risks associated with it, each service’s Reliability Score card will show a new Risks category. Each service will also show the number of unmitigated risks. You can click on the Detected Risks card to see the full list of Detected Risks included in this Test Suite, and whether it’s Mitigated or At Risk for this service.

Viewing Detected Risks for a service.
Changing an active Test Suite will take effect immediately. Any actively running tests will be halted, and any reliability scores will be reset. Changing back to the previous Test Suite will restore previous scores.

To include or exclude Detected Risks in your Test Suite:

  1. In the left-hand navigation pane, open Test Suites, or click this link.
  2. Select the Test Suite you wish to edit.
  3. Select the Risks tab. This displays two lists: the left list contains the risks included in the Test Suite, while the right list contains the risks that aren’t included. The risks on the left will be tested against each of your services and factored into their reliability score.
    1. From the right-hand column, select the risk(s) you want to include by clicking in the checkbox next to its name.
    2. Click on the < button to move your selected risk(s) to the “Include” column. Alternatively, click the << button to move all risks.
    3. Conversely, use the > or >> buttons to move risks from the “Include” column to the “Ignore” column.
  4. Click Save to save your selection.

The next time you access a service’s overview page, you’ll see the new set of Detected Risks listed.

No items found.
This is some text inside of a div block.
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Health Checks
Command Line Interface
Updating Gremlin
Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Container security
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Application Level Fault Injection (ALFI)
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Role Based Access Control
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Integration Agent for Linux
Test Suites
Restricting Testing Times
Process Exhaustion Experiment
Enabling DNS collection