How to simulate missing and failed dependencies using Gremlin

Andre Newman
Sr. Reliability Specialist
Last Updated:
June 4, 2024
Categories:
Chaos Engineering
,
Tired of unreliable dependencies taking your applications offline? Learn how to build and test dependency-resilient services using Gremlin.

In this tutorial, you’ll learn how to simulate a failed dependency using Gremlin. A dependency is any computing component (typically a service) that another service requires in order to perform a function. For example, a database is a dependency of any service that stores data in it.

Dependency failures can happen at any time and for any reason, yet engineers often treat them as if they have 100% availability. This is especially true for SaaS and cloud services, which stand to lose millions of dollars for even a minute-long outage. While the teams managing our dependencies are strongly motivated to keep those dependencies online, downtime is always a possibility. Engineers need to ensure that their applications can stay online even when their dependencies can’t.

For this tutorial, we’ll use a simple Node.js application running on an Amazon EC2 instance, with Amazon RDS as a dependency. These same concepts apply to all other cloud platforms, including Azure and GCP.

Overview

This tutorial will show you how to:

  • Deploy the Gremlin agent to an EC2 instance.
  • Create an Amazon RDS database and connect a Node.js application to it.
  • Run a blackhole experiment to simulate an Amazon RDS outage.

Prerequisites

Before starting this tutorial, you’ll need:

  • A Gremlin account (sign up for a free trial here).
  • An AWS account with access to EC2 (you can use the lowest-tier x86 or Arm instance for this tutorial to save on costs).
  • A demo application (we’ll use this Node.js application provided by the MariaDB team).
  • A Mac, Linux, or WSL (Windows Subsystem for Linux) workstation.

Step 1 - Deploy an EC2 instance with the Gremlin agent

We’ll start by creating an EC2 instance with both the Gremlin agent and a Node.js application running on it. We need to be able to authenticate the Gremlin agent to our Gremlin account, and we can do that using a client configuration file.

  1. Log into the Gremlin web app and access your Team Settings by clicking on the account icon in the top-right corner of the page.
  2. Click on the Configuration tab.
  3. Next to “Client Configuration File,” click Download. This will download a YAML file named config.yaml containing everything you need to authenticate the Gremlin agent to your Gremlin account and team. 
Warning
Keep this file secret! Anyone with access to it can add their own agents to your Gremlin account.

Now, let’s create the EC2 instance:

  1. Log into your AWS account and open EC2.
  2. Click Launch instance and configure your new instance.
  3. Give it a Name, such as “gremlin-dependency-demo”.
  4. Select Amazon Linux 2023 or later as the Amazon Machine Image (AMI). You could use another AMI as long as Gremlin supports it, but the instructions in this tutorial will assume Amazon Linux 2023.
  5. Select an Instance type. You can use a free-tier-eligible type, like t2.micro.
  6. Select or create a key pair to connect to your instance over SSH. Alternatively, you can just use EC2 Instance Connect to connect to your instance via your browser.
  7. Create or select a security group. Make sure to allow SSH traffic if you plan to use SSH to connect to your instance. You’ll also need to allow inbound traffic over port 3000 to access the application.
  8. Leave the remaining options set to their default, or configure them how you wish, then click Launch instance.
  9. Once your instance is up and running, connect to it using EC2 Instance Connect, SSH, or your preferred method.

In your EC2 instance’s shell, install the Gremlin agent by running the following commands (assuming Amazon Linux 2023 or later):

SHELL

# Install dependencies
sudo dnf install -y iproute-tc

# Add the Gremlin repo
sudo curl https://rpm.gremlin.com/gremlin.repo -o /etc/yum.repos.d/gremlin.repo

# Install Gremlin
sudo dnf install -y gremlin gremlind

Finally, we need to apply the client configuration file so the agent can authenticate with Gremlin:

  1. In a text editor like nano or vim, open /etc/gremlin/config.yaml and replace its contents with the contents of your client configuration file. Save your changes.
  2. Restart the Gremlin agent by running systemctl restart gremlind.

You can check whether the agent was installed correctly by opening the Gremlin web app and checking the Agents page.

Tip
Having trouble getting the agent installed? Check out our troubleshooting FAQ.
Screenshot of the agent list in the Gremlin web app.

Step 2 - Deploy an Amazon RDS database

Next, let’s create our RDS database. This database will act as our dependency.

Note
For this tutorial, we’re going to create a basic admin user and password. You’ll want to use a more secure authentication method when running databases in production.
  1. Log into your AWS account and open RDS.
  2. Select Databases from the nav menu, then click Create database.
  3. Select Easy create as the database creation method.
  4. Select MySQL as the engine type.
  5. Select Free tier as the DB instance size.
  6. Enter an Identifier for your database, such as “gremlin-dependency-demo”.
  7. Enter a Master username for the database. Make sure to remember this!
  8. Choose Self managed for credentials management and enter (or autogenerate) a password. You’ll need to remember this, too!
  9. Expand the Set up EC2 connection section, then click Connect to an EC2 compute resource. In the EC2 instance dropdown, select the instance you created in step 1 of this tutorial.
  10. Click Create database.

AWS will provision and create your database with the username and password specified, and the EC2 connection will allow our instance to connect to the database. The last thing we need to do is create a database and table for the tasks application.

To do this, connect to your EC2 instance using EC2 connect or SSH. Once you have a terminal window open, enter the following comment. Replace HOSTNAME with the hostname of your RDS instance, and DB_USERNAME with the username of your database user. You’ll be prompted to enter the user’s password after entering the command.

SHELL

mysql -h HOSTNAME:3306 -u DB_USERNAME -p

From here, we’ll run two scripts: the first creates a database named “todo”, and the second creates a table named “tasks”:

SQL

CREATE DATABASE todo;

CREATE TABLE todo.tasks (
  id INT(11) unsigned NOT NULL AUTO_INCREMENT,
  description VARCHAR(500) NOT NULL,
  completed BOOLEAN NOT NULL DEFAULT 0,
  PRIMARY KEY (id)
);

Now we’re ready to deploy our app and connect it to our database! Type quit to close the MySQL client.

Step 3 - Deploy a Node.js application to your EC2 instance

Now that we have an EC2 instance and database, let’s deploy the application. We’ll use the same application that we used in an earlier tutorial on using Gremlin with Amazon RDS. This application is a simple to-do list presented as a website.

In your EC2 instance’s console, install git and Node.js:

SHELL

sudo dnf install -y git nodejs

Next, clone the example repository:

SHELL

git clone https://github.com/mariadb-developers/todo-app-nodejs

This application has two components: an API service, and a client service. Let’s start with the API service. Run these two commands to change to the API component’s folder and install the necessary libraries:

SHELL

cd src/api
npm install

When running the API, we’ll provide our database connection details as environment variables. In the following command, replace the following strings:

  • YOUR_HOST: Your MariaDB server’s hostname.
  • YOUR_PORT: Your MariaDB server’s port number.
  • YOUR_USER: The username you want to use to log in to MariaDB.
  • YOUR_PASS: The password used to log in to MariaDB.

If the database you created in step 1 has a different name other than "todo", replace DB_NAME=todo with DB_NAME=[your database name].

SHELL

DB_HOST=YOUR_HOST DB_PORT=YOUR_PORT DB_USER=YOUR_USER DB_PASS=YOUR_PASS DB_NAME=todo npm start &

Now, we can start the client. Open a new terminal window and run the following commands:

SHELL

cd ../client
git submodule update --init
npm install && npm start &

You should now be able to open your EC2 instance’s public URL over port 3000 in your web browser and see the application’s main page. If the page doesn’t appear, or if you see an error message, you might need to check your database connection parameters. Alternatively, try using plain HTTP instead of HTTPS.

TODO app as shown in a web browser, with some example tasks.

Step 4 - Run an experiment on your Node.js application

Now that our database and application are up and running, we can run our experiment. We’ll run a blackhole experiment, which drops network packets. Gremlin’s blackhole experiment can be customized to target specific packets based on the destination hostname, port number, protocol, and IP address.

  1. Log into the Gremlin web app.
  2. Select Experiments from the nav menu, then click New Experiment.
  3. Select Hosts, then select your EC2 instance. The easiest way to do this is to search for your EC2 instance by name, ID, or IP address.
  4. Click on Choose a Gremlin, select the Network category, then select Blackhole.
  5. Click on the Service Providers drop-down and search for “RDS”.
  6. In the Hostnames box, enter the hostname of your RDS instance.
  7. Set the value of the Remote Ports box to ^53,306 (if you’re using a non-standard port number, enter it instead of 3306).
  8. Optionally, in the Providers box, type in the name of the availability zone where your RDS database is running. For instance, if it’s running in us-east-2, select “aws:amazon:us-east-2”.
  9. Click Run Experiment.
Configuring a blackhole experiment in the Gremlin web app.

Once the experiment enters the “Running” state, try opening your web application in a browser. Do you notice anything unusual? Is the application displaying an error message, or taking longer than normal to respond? Did the application crash? What happens when you try performing an action that queries the database? This is your opportunity to observe the application, take notes, and think of ways to reduce the impact caused by the dependency failure.

TODO app with missing tasks and an HTTP 500 internal server error response in the browser's developer console.
Tip
If you want to stop the experiment early, click the Halt button in the top-right corner of the Gremlin web app.

Try this next

As an additional challenge: what could you do to prevent a dependency failure in the first place? In this example, we tested what would happen if we completely lost access to our database. In the real world, it’s far more likely that services will have partial outages in specific zones or regions.

As an optional follow-up, try converting your RDS database into a multi-AZ deployment, then repeat the test. Do you still get an error? Was there any impact to website performance or traffic throughput? If not, then you’ve successfully found a way to make your dependencies more resilient, which in turn, makes your application more resilient!

No items found.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
start your trial

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Product Hero ImageShape