How to run multiple experiments in parallel using Gremlin

Andre Newman
Sr. Reliability Specialist
Last Updated:
November 27, 2023
Categories:
Chaos Engineering
,
Reliability Management
,

Introduction

Gremlin lets you run multiple Chaos Engineering experiments in a single workflow called a Scenario. Normally these experiments run sequentially, but Gremlin also lets you run experiments in parallel. In this tutorial, we'll show you how to create a Scenario that runs two different experiments simultaneously. We'll show you how to create your own custom Scenario, how to set up branching, and what to consider when creating branched Scenarios.

Overview

This tutorial will show you how to:

  • Create a new Scenario using an existing Scenario as a template.
  • Use the branching mechanism to create two parallel experiment paths in a Scenario.
  • Use Health Checks to monitor the health of your systems during the Scenario.

Prerequisites

Before starting this tutorial, you’ll need the following:

Step 1 - Clone a Recommended Scenario

In this step, you’ll create a new Scenario by cloning an existing Recommended Scenario. A Recommended Scenario is a pre-built Scenario created by the Gremlin team to test for common use cases like scalability, redundancy, and recoverability. You can run a Recommended Scenario as-is, but in this case, we'll use one as a springboard.

  1. Log into the Gremlin web app.
  2. Click on this link to open Recommended Scenarios. Alternatively, click on Scenarios in the left-hand navigation menu, then click on the Recommended tab.
  3. Look for the Scalability: CPU Scenario. You can browse the list, use the search box, or simply click on this link. This Scenario runs a series of three CPU tests with increasing usage. First it uses 50% CPU, then 75%, then 90%. We'll edit this Scenario by adding a memory test alongside each CPU test.
  4. Click Customize to start editing the Scenario.

Step 2 - Customize your new Scenario

In this step, we'll edit our new Scenario by adding a branch and another experiment. A branch is a series of nodes that runs sequentially. They can run simultaneously alongside other branches and can even contain nested branches.

  1. Give your new Scenario a new name, such as Scalability: CPU and Memory. This helps differentiate it from ‌the default version. We also recommend changing the description to include the memory test.
  2. Scroll down to the Scenario creation pane. You'll see a section for adding Health Checks, followed by a section for adding nodes. Each node represents an action that takes place during the Scenario. The layout should match this screenshot:
    A screenshot of the default "Scalability: CPU" Scenario showing five nodes.
  3. At the very bottom of the node list, click Add, then select Concurrent Node. This moves each of the existing nodes under a branch named "Branch 1." Beneath that is a second branch named "Branch 2." Any steps we add to branch 2 will run simultaneous to the steps in branch 1.

Step 3 - Add a memory experiment

Now that we have our new branch, let's add a memory experiment.

  1. Under "Branch 2", click Add Node, then New Test.
  2. In the Edit Test pane:
    1. Select the system(s) you want to run the test on. You can read our documentation to learn more about target selection.
    2. Under Choose a Gremlin, select the Resource category, then the Memory experiment. By default, this increases the amount of memory used to 1/2 GB. If your system is already using 1/2 GB of memory, the test will do nothing. To add more of an impact, change the Memory Amount option from GB to %, then increase the value from 1 to 50. This will increase memory usage on the system to 50% of its total memory.
    3. Increase the Length option to 300. This runs the experiment for 300 seconds, or 5 minutes.
    4. Click Update Scenario to save this experiment and add it to the Scenario.
  3. Double-check your Scenario flow to make sure you now have a 5-minute memory experiment under branch 2. You should now have 7 total nodes.
    A screenshot of a Scenario showing two branches and seven total steps
  4. Click Save Scenario to save your new Scenario.
    An overview of the newly created Scenario showing the second branch

Step 4 - Run the Scenario

Now, we get to run our Scenario. After you saved your Scenario, you'll see a button labeled Run Scenario. Click on it, then click Run Scenario again to confirm.

While the Scenario is starting, this would be a great time to pull up any metrics you have for the target system. Metrics will tell you how your system is responding to the test, and whether something unexpected happens, such as a system failure. If you don't have a monitoring or observability solution set up, even something like Windows Task Manager or htop will work. Gremlin also automatically graphs CPU usage during the experiment.

You'll know both experiments are running simultaneously by the animated icon next to their names:

A screenshot of the Scenario while it's running

While the Scenario is running, see how your system behaves. Does it run sluggishly? Are any applications or processes slowed down or terminated? Does the system start moving memory to swap space, and if so, how does that impact responsiveness?

If something unexpected or undesirable happens (like the system crashing), remember you can stop the Scenario by clicking the big red Halt Scenario or Halt All Tests buttons in the top-right corner of the web app.

Conclusion

Congratulations, you've successfully created a Scenario that runs two experiments side-by-side! Here are some additional steps you can take to get the most out of your Scenarios:

  • Add a Health Check to automatically monitor the state of your target system(s) while the Scenario is running. A Health Check will also halt and revert the Scenario if it detects an unhealthy system.
  • Add two more memory experiments to your Scenario to match the two remaining CPU experiments (add one that consumes 75% of total memory, and another that consumes 90%). Remember to add 5-second delays in between!
  • Add more branches to test for different situations. What happens if you run a latency experiment alongside your CPU and memory experiments? What if you consumed two different amounts of CPU on two different cores? What if you ran a latency and packet loss experiment alongside your CPU and memory experiments?

Branches add a near-infinite number of possible Scenario configurations limited only by your creativity and use cases. If you need more inspiration, remember that we have over 30 additional pre-made Recommended Scenarios that you can use as a template. We also recommend thinking of any recent incidents or outages you or your team have experienced, and building a Scenario that replicates it.

No items found.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
start your trial

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Product Hero ImageShape