How to Install and Use Gremlin on Ubuntu 18.04

How to Install and Use Gremlin on Ubuntu 18.04

Overview

Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform.

This tutorial will show you how to install the Gremlin agent on Ubuntu 18.04 hosts, and how to perform your first Chaos Engineering experiment, a CPU attack.

  • Step 1 - Installing the Gremlin Agent
  • Step 2 - Running your first CPU experiment
  • Step 3 - Halting an attack

Prerequisites

  • An Ubuntu 18.04 host. You need to have sudo or root access on the host.
  • A Gremlin account (sign up here).

Step 1 - Installing the Gremlin Agent

Connect to your host with ssh and install the Gremlin repo:

ssh username@your_server_ip

echo "deb https://deb.gremlin.com/ release non-free" | sudo tee /etc/apt/sources.list.d/gremlin.list

Import the GPG key:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C81FC2F43A48B25808F9583BDFF170F324D41134 9CDB294B29A5B1E2E00C24C022E8EF3461A50EF6

Then install the Gremlin client and daemon:

sudo apt-get update && sudo apt-get install -y gremlin gremlind

The next step is to configure the Gremlin agent with your Gremlin Team ID and Gremlin Secret. Log into the Gremlin web UI with your email address and password, and then go to Company Settings.

Company Settings page

Click on your team in the list. You’ll be taken to the team details page.

Team Details page

To configure the Gremlin agent you’ll need the Team ID and Secret Key. The Team ID is created automatically. To create the Secret Key, hit the Create button. You’ll see a window where you can copy the Secret Key:

Secret Key

Make sure to make a note of your Secret Key, as this is the only time you will be able to view it. If you lose it, you’ll need to hit the Reset button and generate a new one.

Now that we have the Gremlin Team ID and Secret Key, we can finish configuring the client. Go back to your SSH session on the Ubuntu host and run this command:

gremlin init

Input your Team ID and Secret Key when you’re prompted for them.

The setup is now complete and you’re ready to begin running Chaos Engineering experiments.

Step 2 - Running your first CPU experiment

On your Ubuntu host, run the “top” command. This is how we’ll view the CPU usage for this experiment.

top command

In the Gremlin web UI, click the Attack link in the left navigation bar, and then click the New Attack button. Select your Ubuntu host as the target:

Select Target

Scroll down and click Choose a Gremlin. The CPU attack should be selected by default. If not, click on Resource and then CPU.

Select CPU attack

Scroll down again to enter the settings for the attack. For this first attack we’ll set the length to 180 seconds, select All Cores, and leave the CPU percentage at the default setting. Then click Unleash Gremlin, which will start the attack.

Unleash Gremlin

You’ll then see the attack listed as Active.

Go back to your SSH session on the Ubuntu host and examine your top output. Once the attack changes to a Running state, you should see much more CPU activity than previously.

top command

The attack will end after the 180 seconds have passed. You’ll then see it listed in Gremlin as Completed.

Experiment completet

Step 3 - Halting an attack

It’s a recommended practice to define abort conditions before running Chaos Engineering experiments. Abort conditions are things that would make us want to halt an experiment immediately, because we are concerned about the safety of our systems. Abort conditions could be defined as an increase in error rate, an increase in latency, or specific alerts we receive. For abort conditions to be useful, our Chaos Engineering tool needs to allow us to halt experiments immediately. Gremlin allows us to halt individual attacks, or all running attacks.

In the Gremlin UI go to Attack and New Attack, and launch another CPU attack with the same settings as last time. Once it’s running you’ll see it listed again under the Active attacks.

Halt attacks

Once the attack is in the Running state, there are two options for halting it. We can either click the Halt button to the right of the attack, or the Halt All Attacks button. In this case either would work, as we only have one attack running, but in some situations we might want to halt one attack without impacting others.

The ability to quickly halt all running experiments is an important part of Chaos Engineering, and allows us to experiment in a safe way.

Conclusion

At this point you have an Ubuntu 18.04 host running with Gremlin, you’ve run your first Chaos Engineering attacks, and you’ve learned how to halt running attacks. Congrats!

To learn more about Gremlin you can read the documentation, which explains the other types of Chaos Engineering attacks you can perform. To learn more about Chaos Engineering join our Chaos Engineering Slack, and read more tutorials on our Community page.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Use Gremlin for Free and see how you can harness chaos to build resilient systems.

Use For Free