How to Install and Use Gremlin on Amazon Web Services

Follow this guide to install and use Gremlin on an EC2 instance on Amazon Web Services. Given that you have a ready to use AWS account and a Gremlin account, this should take you no more than 15 minutes.

Step 1 - Installing Gremlin on your EC2 instance

If you don't already have an available Amazon Linux EC2 instance, follow the AWS documentation here to launch an EC2 instance:Getting Started with Amazon EC2 Linux Instances

In this step, you'll install Gremlin on your EC2 instance.

First, ssh into your EC2 instance:

ssh -i yourkey.pem ec2-user@ip-address-of-server

Then add the Gremlin repo, and install the Gremlin client and daemon:

sudo curl https://rpm.gremlin.com/gremlin.repo -o /etc/yum.repos.d/gremlin.repo
sudo yum install -y gremlin gremlind

Step 2 - Validating Installation

Run the following command to confirm you have all the necessary components installed on your host for Gremlin to function correctly:

gremlin syscheck

You should see Gremlin performing all the checks and returning System check OK.

Step 3 - Registering with Gremlin

After you have created your Gremlin account (sign up here) you will need to find your Gremlin Daemon credentials. Login to the Gremlin App using your Company name and sign-on credentials. These were emailed to you when you signed up to start using Gremlin.

Navigate to Team Settings and click on your Team. Click the blue Download button to save your certificates to your local computer. The downloaded certificate.zip contains both a public-key certificate and a matching private key.

certificates

Unzip the downloaded certificate.zip on your laptop and copy the files to the server you will be using with a Linux file transfer tool such as rsync, sftp or scp. Alternatively, you can store these certificates in a storage service such as AWS S3. For example:

rsync -avz /Users/tammybutow/Desktop/tammy-client.pub_cert.pem tammy@142.93.31.189:/var/lib/gremlin
rsync -avz /Users/tammybutow/Desktop/tammy-client.priv_key.pem tammy@142.93.31.189:/var/lib/gremlin

You can also set an identifier for this target so it's recognizable. If it's not set, the identifier defaults to the ip address.

export INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id)
export GREMLIN_IDENTIFIER=$INSTANCE_ID

**Creating a gremlind file for your environment variables **

Next create the /etc/default/gremlind file:

sudo vim /etc/default/gremlind

Add your GREMLIN environment variables below, for example:

GREMLIN_TEAM_ID="3f242793-018a-5ad5-9211-fb958f8dc084"GREMLIN_TEAM_CERTIFICATE_OR_FILE="file:///var/lib/gremlin/tammy-client.pub_cert.pem"GREMLIN_TEAM_PRIVATE_KEY_OR_FILE="file:///var/lib/gremlin/tammy-client.priv_key.pem"GREMLIN_CLIENT_TAGS="service=prometheus"

Save the file. Restart the service:

sudo service gremlind restart

Confirming your gremlind configuration

Take a look at /var/log/gremlin/daemon.log to confirm:

tail /var/log/gremlin/daemon.log

You should see an output similar to below if it was successful:

2018-10-31 02:34:20 - Logging successfully initialized2018-10-31 02:34:23 - Using Team ID : 3f242793-018a-5ad5-9211-fb958f8dc0842018-10-31 02:34:23 - Using Identifier : 142.93.31.1892018-10-31 02:34:23 - Found GREMLIN_TEAM_CERTIFICATE_OR_FILE in file:///var/lib/gremlin/tammy-client.pub_cert.pem2018-10-31 02:34:23 - Found GREMLIN_TEAM_PRIVATE_KEY_OR_FILE in file:///var/lib/gremlin/tammy-client.priv_key.pem

Step 4 - Creating attacks using the Gremlin Control Panel

Login to the Gremlin Control Panel using your Company name and sign-on credentials. These details were emailed to you when you created your Gremlin account.

Select Create Attack in the Gremlin Control Panel.

Example: Network Latency attack pinging Google

You can use the Gremlin Control Panel or the Gremlin API to trigger Gremlin attacks. You can view the available range of Gremlin Attacks in Gremlin Help.

Working in a cloud environment, the network is prone to jitters and occasional blips. The Network Grelin enables you to simulate these behaviors allowing you to observe your application's behavior in handling potential inconsistent network activities.

The Latency Attack will inject latency into all matching egress network traffic from your EC2 instance. Best practice is to start with a small delay in latency and grow in successive attacks to observe the behaviors. As a default, you can start with a 100ms delay for 60s.

To target your EC2 instance, click Exact and select your instance ID in the list.

Before you create the attack, in order to see the effects from the attack, first switch to your EC2 instance and start a ping to google.com

[ec2-user@ip-10-0-8-81 gremlin]$ ping google.com
PING google.com (216.58.217.142) 56(84) bytes of data.
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=1 ttl=48 time=1.27 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=2 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=3 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=4 ttl=48 time=1.29 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=5 ttl=48 time=1.30 ms

Now go back to Gremlin Control Panel, and click Create to kick off the attack. Your attack should begin to run. You can check on the progress via Gremlin Attacks.

Switch back to your EC2 instance and you can confirm that the round trip time to google.com is increased by 100ms, as expected.

[ec2-user@ip-10-0-8-81 gremlin]$ ping google.com
PING google.com (216.58.217.142) 56(84) bytes of data.
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=1 ttl=48 time=1.27 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=2 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=3 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=4 ttl=48 time=1.29 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=5 ttl=48 time=1.30 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=6 ttl=48 time=1.30 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=7 ttl=48 time=1.36 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=8 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=9 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=10 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=11 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=12 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=13 ttl=48 time=101 ms

Step 5 - Halting the Attack using the Gremlin Control Panel

Safety is paramount. You can stop a Gremlin Attack at anytime using the Gremlin Control Panel. Navigate to Gremlin Attacks and click on the red Halt button.

Conclusion

You've installed Gremlin on an EC2 instance on Amazon Web Services running Amazon Linux and validated that Gremlin works by running a Latency attack. You now possess tools that make it possible for you to explore additional Gremlin Attacks including attacks that impact State and Network.

Gremlin's Developer Guide is a great resource and reference for using Gremlin to do Chaos Engineering. You can also explore the Gremlin Blog for more information on how to use Chaos Engineering with your application infrastructure.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Try Gremlin for free and see how you can harness chaos to build resilient systems.