How to Install and Use Gremlin on Amazon Web Services

Follow this guide to install and use Gremlin on an EC2 instance on Amazon Web Services. Given that you have a ready to use AWS account and a Gremlin account, this should take you no more than 15 minutes.

Step 1 - Installing Gremlin on your EC2 instance

If you don't already have an available Amazon Linux EC2 instance, follow the AWS documentation here to launch an EC2 instance:Getting Started with Amazon EC2 Linux Instances

In this step, you'll install Gremlin on your EC2 instance.

First, ssh into your EC2 instance:

ssh -i yourkey.pem ec2-user@ip-address-of-server

Then add the Gremlin repo, and install the Gremlin client and daemon:

sudo curl https://rpm.gremlin.com/gremlin.repo -o /etc/yum.repos.d/gremlin.repo
sudo yum install -y gremlin gremlind

Step 2 - Validating Installation

Run the following command to confirm you have all the necessary components installed on your host for Gremlin to function correctly:

gremlin syscheck

You should see Gremlin performing all the checks and returning System check OK.

Step 3 - Registering with Gremlin

You'll need to register with the Gremlin control plane to create a new Gremlin client session. Gremlin offers tags, a feature allowing you to apply custom labels to a Gremlin client. You can tag more than one Gremlin client with the same label, then use it to view a filtered list of Gremlin clients that share that particular tag. In addition to filtering, the Gremlin Control Panel and API allow you to initiate an action across multiple Gremlin clients with the same tag. Identifying groups of Gremlin clients and administering all of them at once reduces the time required to manage hosts.

You'll need your organization ID, your organization secret to initialize the instance as a target for Gremlin to attack. To retrieve the organization Id and secret, login to the Gremlin Control Panel using your Team name and sign-on credentials. These details were emailed to you when you signed up to start using Gremlin.

Next click on your name and select Settings in the Gremlin Control Panel.

You will find your Team ID on the left under your company name, then click to generate your Team Secret. We recommend you store your Team Secret somewhere safe since it is only available once. If you lose your Team Secret you can reset it.

To simplify setup, let's set the following environment variables:

export GREMLIN_TEAM_ID=<your team id>
export GREMLIN_TEAM_SECRET=<your team secret>

You can also set an identifier for this target so it's recognizable. If it's not set, the identifier defaults to the ip address.

export INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id)
export GREMLIN_IDENTIFIER=$INSTANCE_ID

You can add additional metadata via key=value tags, so it can easily be targeted for attacks. For example, you can use tags to indicate the instance type and instance ID.

Initialize Gremlin with the following command:

gremlin init --tag instance_type=t2.micro --tag instance_id=$INSTANCE_ID

For an EC2 instance, Gremlin will also auto-populate tags for region and zone.

The output confirms your Organization Id as well as your identifier.

You are now ready to create attacks using the Gremlin Control Panel.

Step 4 - Creating attacks using the Gremlin Control Panel

Login to the Gremlin Control Panel using your Company name and sign-on credentials. These details were emailed to you when you created your Gremlin account.

Select Create Attack in the Gremlin Control Panel.

Example: Network Latency attack pinging Google

You can use the Gremlin Control Panel or the Gremlin API to trigger Gremlin attacks. You can view the available range of Gremlin Attacks in Gremlin Help.

Working in a cloud environment, the network is prone to jitters and occasional blips. The Network Grelin enables you to simulate these behaviors allowing you to observe your application's behavior in handling potential inconsistent network activities.

The Latency Attack will inject latency into all matching egress network traffic from your EC2 instance. Best practice is to start with a small delay in latency and grow in successive attacks to observe the behaviors. As a default, you can start with a 100ms delay for 60s.

To target your EC2 instance, click Exact and select your instance ID in the list.

Before you create the attack, in order to see the effects from the attack, first switch to your EC2 instance and start a ping to google.com

[ec2-user@ip-10-0-8-81 gremlin]$ ping google.com
PING google.com (216.58.217.142) 56(84) bytes of data.
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=1 ttl=48 time=1.27 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=2 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=3 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=4 ttl=48 time=1.29 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=5 ttl=48 time=1.30 ms

Now go back to Gremlin Control Panel, and click Create to kick off the attack. Your attack should begin to run. You can check on the progress via Gremlin Attacks.

Switch back to your EC2 instance and you can confirm that the round trip time to google.com is increased by 100ms, as expected.

[ec2-user@ip-10-0-8-81 gremlin]$ ping google.com
PING google.com (216.58.217.142) 56(84) bytes of data.
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=1 ttl=48 time=1.27 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=2 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=3 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=4 ttl=48 time=1.29 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=5 ttl=48 time=1.30 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=6 ttl=48 time=1.30 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=7 ttl=48 time=1.36 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=8 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=9 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=10 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=11 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=12 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=13 ttl=48 time=101 ms

Step 5 - Halting the Attack using the Gremlin Control Panel

Safety is paramount. You can stop a Gremlin Attack at anytime using the Gremlin Control Panel. Navigate to Gremlin Attacks and click on the red Halt button.

Conclusion

You've installed Gremlin on an EC2 instance on Amazon Web Services running Amazon Linux and validated that Gremlin works by running a Latency attack. You now possess tools that make it possible for you to explore additional Gremlin Attacks including attacks that impact State and Network.

Gremlin's Developer Guide is a great resource and reference for using Gremlin to do Chaos Engineering. You can also explore the Gremlin Blog for more information on how to use Chaos Engineering with your application infrastructure.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Try Gremlin for free and see how you can harness chaos to build resilient systems.