A Scenario is a set of Health Checks and Gremlin experiments that you can define, along with a name, description, hypothesis, and detailed results. After each Scenario run, you can record your observations directly in the Gremlin web app. Scenarios are useful for running sequences of experiments with expanding scope (i.e. the number of systems impacted) and intensity. Scenarios are also useful for recreating past outages, or simulating potential outages.
For additional information on Scenario features, visit the following pages:
Create a Scenario by entering a name to identify your experiment. The description and hypothesis are optional for running the Scenario but help to describe and formulate how you expect your system to behave. For the description, it's helpful to include services you're testing, use cases for the Scenario, or when, and when not, to use that Scenario. For the hypothesis, think through how you expect your application or environment to behave when a sequence of experiments within your Scenario is run. How will your system react to the failure modes you'll be exposing it to? Is your system designed to handle this type of failure? If so, is the system working as designed?
A hypothesis can be useful to record and validate expected behavior and assumptions about your system to see how actual behavior may differ from what is expected.
Add a Health Check to your Scenario to set up automatic halt conditions to safely stop the Scenario, validate your system is in a steady state, or validate your system returned to normal after an experiment before continuing to expand the blast radius and magnitude of impact with an experiment.
Use the drop down menu to import a Health Check from the Health Check library within the Scenario creation workflow. Importing a Health Check into your Scenario creates a reference to the Health Check in the library. This allows you to make updates to the Health Check in the library and have it populate to Scenarios that use it.
You can also use the imported Health Check as a template and hit the "Customize" button. This creates a copy of the imported Health Check and allows you to make changes to the configuration. Changes won't impact the Health Check in the library.
Once you’ve added a Health Check to a Scenario you can add experiments and more Health Checks as needed. The best practice is to add a Health Check before each experiment to validate your service is in a healthy state before introducing failure. In some cases you might want to add a Health Check at the end of the Scenario to validate your service returned to its steady state. Use the Continuous Health Check option for Health Checks that you want evaluated throughout the duration of the Scenario. You can follow the instructions in the Scenario document for Running a Scenario.
Start crafting your Scenario by either adding a completed experiment or by creating a new experiment. Selecting the option to add a completed experiment will show you a list of all experiments that have run in the past. Find the experiment you'd like to add to the Scenario and choose targets to impact. Once you've selected the tags of hosts to impact, the experiment will be added to your Scenario.
Alternatively, to start with a new experiment, add a new experiment to the Scenario. Select the tags of hosts to impact, followed by the experiment type and configuration options.
Continue to add as many experiments to the Scenario as you like, changing the tags of hosts to target and experiment configuration as desired to grow the blast radius. You can also add multiple experiment types to recreate incidents with multiple failure modes, introduce cascading failure, and ultimately build Scenarios that make use cases easy to develop.
A delay of 5 seconds is added between each experiment you add to the Scenario. This delay can be set to any number of seconds, minutes, or hours.
Use tags to select targets for experiments within your Scenario. By default, no hosts or containers are selected. This is done for safety to prevent users from accidentally experimenting all targets. To expand the blast radius of the experiment, select one or more tags per category.
The Exact targeting method is not an option for experiments within a Scenario, tags must be used.
Save the Scenario to have it enter a draft state. The Scenario will be visible in the list of Scenarios and the experiment configuration can continue to be edited.
When a Scenario is no longer relevant or needed, you can archive the Scenario to remove it from the active set. Navigate to the Archived tab to view these Scenarios. Unarchive a Scenario if you'd like to return it to the set of active Scenarios.
To start running through the sequence of Health Checks and experiments defined in a Scenario, run the Scenario from either the Scenario configuration view or on each Scenario card. A Scenario will show that it's running with a message at the top along with a Halt Scenario button. Halting a Scenario will halt the experiment underway, along with the Scenario itself so that no further experiments will begin. The active experiment within the Scenario will be visually indicated. As each experiment progresses, the state and logs of each experiment are available.
To edit a Scenario, open the Scenario you want to edit, and click on the "Edit Scenario" icon. You can also edit a Scenario from the Scenario list cards. Hover your cursor over the card to show the overflow menu. Click the overflow menu icon and select “Edit". This will pull up the latest configuration of the Scenario and allow you to change any Health Check or experiment parameter and targets. The Scenario can then be saved or run and will become the latest configuration. This is a great way to iterate and safely grow your blast radius and magnitude of impact while keeping a history of previous runs.
The experiment visualization feature is available for scenarios as well, allowing you to monitor the impact of the chaos experiments on your environment. This allows you to quickly verify the effect of your experiments and also to save the results for future reference.
Company admins can turn this feature on for the entire company by navigating to “Company Settings”, clicking on the “Settings” tab, and toggling “Attack Visualizations” on.
The Scenario details view shows on which day and when the Scenario has been run. For each run of a Scenario, the result of the Scenario is available. You can enter notes and observations for the Scenario run, as well as indicating with checkboxes whether the Scenario produced an expected result or if an incident was detected and/or mitigated.
With Gremlin’s Jira integration, you can create and track Jira issues directly from Scenario Runs and GameDay Summaries. Jira integration must be enabled at the individual user level. See Enabling Jira integration for more information.
The Project, Issue Type, Priority, Assignee, and labels are all retrieved based on the content in your connected Jira Cloud instance. You can attach existing labels to the issue or create new ones, just as you would in Jira. To create a new label, type the value and press Enter.
The Summary field represents the Jira issue name and the Description section is automatically populated with information about the Scenario Run. It also contains the results and a link back to the Scenario Run.
All issues linked to a Scenario Run are listed under the Jira Issues section. The Summary, Assignee, Priority, and Status are updated from Jira when you open the page, and the information is cached for 5 minutes. To view an issue from the list in Jira, just click on it.
To create a Jira issue:
- On a Scenario Details page, click the Runs tab.
- Under the Jira Issues section, click Create Issue.
- In the Create Jira Issue popup, select the Project, Issue Type, Priority, Assignee, and any labels you want to use. The Summary and Description contains information from the Scenario Run; you can edit this information as necessary.
- Click Create Issue. The new Jira issue will be created in the selected project and listed in the Jira Issues section in Gremlin.
During a GameDay, you can create multiple Jira issues for each Scenario Run. These will be listed under the Runs tab for each specific Scenario Run. On the GameDay Summary page, you will see a cumulative list of all Jira issues created during that GameDay, meaning all issues under all Scenario Runs.
View the history of a Scenario by opening a Scenario and clicking on the Runs tab to see a historical list of previous runs. Runs are listed in reverse chronological order. You can use the filter if you want to find specific results. You can also click on a previous run and click Revert to this version to run that specific configuration.
To duplicate a Scenario, open the Scenario you want to duplicate, click the overflow menu on the configuration page, and select “Duplicate”. You can also duplicate a Scenario from the Scenario list cards. Hover your cursor over the card to show the overflow menu. Click the overflow menu icon and select “Duplicate". This will create a copy of the previous Scenario. From there, the details, Health Checks, experiments, and targets can be changed. The Scenario can then be saved or run.