How to Install and use Gremlin with EKS

How to Install and use Gremlin with EKS

Introduction

Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. This tutorial will walk through how to install Gremlin on Amazon’s Managed Kubernetes Service (EKS) with a demo environment and perform a Chaos Engineering experiment using a Gremlin Shutdown attack.

Prerequisites

Before you begin this tutorial, you’ll need the following:

Overview

This tutorial will walk you through the required steps to run an EKS cluster, deploy two applications and then run a Chaos Engineering experiment using Gremlin.

  • Step 0 - Verify your account AWS CLI Installation
  • Step 1 - Create an EKS cluster using eksctl
  • Step 2 - Load up the kubeconfig for the cluster
  • Step 3 - Install Gremlin using Helm
  • Step 4 - Deploy a Microservice Demo Application
  • Step 5 - Run a Shutdown Container Attack using Gremlin

Step 0 - Verify your account AWS CLI Installation

In this step, you’ll first verify that you have your AWS CLI configured to use eksctl to create the EKS cluster:

 aws --version

This should give you an output similar to:

aws-cli/1.16.150 Python/3.7.3 Darwin/18.5.0 botocore/1.12.140

If you’re having issues, refer back to the AWS CLI Installation documentation.

Step 1 - Create an EKS cluster using eksctl

For this tutorial, we are going to use Weave Work’s open source tool, eksctl, to create our EKS clusters. On your local machine, install eksctl:

curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest\_release/eksctl\_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp sudo mv /tmp/eksctl /usr/local/bin

After installing eksctl, create a basic cluster:

eksctl create cluster

This will create a cluster and the needed resources in us-west-2. It will auto-generate a cluster name, create 2 m5.large ec2 instances using the official AWS EKS AMI, and set up a dedicated VPC.

Step 2 - Load up the kubeconfig for the cluster

Verify that the eks cluster has been set up properly:

eksctl get clusters

The output should display the name of your cluster and the region similar to:

NAME		REGION

gremlin-eks	fabulous-mushroom-1527688624

You can now grab the kubeconfig file from AWS using the AWS CLI and passing the cluster name and region:

sudo aws eks --region us-west-2  update-kubeconfig --name fabulous-mushroom-1527688624

To averify the hosts that eksctl has setup for us, run:

kubectl get nodes

Step 3 - Install Gremlin using Helm

Step 3.1 - Retrieve Your Team ID and Secret Key

To install the Gremlin Kubernetes client, you will need your Gremlin Team ID and Secret Key. If you already know what those are, you can skip to step 3.2. If you don’t know what your Team ID and Secret Key are, you can get them from the Gremlin web app.

Visit the Teams page in Gremlin, and then click on your team’s name in the list.

Gremlin Teams page

On the Teams screen click on Configuration.

Make a note of your Team ID.

If you don’t know your Secret Key, you will need to reset it. Click the Reset button. You’ll get a popup reminding you that any running clients using the current Secret Key will need to be configured with the new key. Hit Continue.

Next you’ll see a popup screen that will show you the new Secret Key. Make a note of it.

Step 3.2 - Install the Gremlin Client

The simplest way to install the Gremlin client on your Kubernetes cluster is to use Helm. If you do not already have Helm installed, go here to get started. Once Helm is installed and configured, the next steps are to add the Gremlin repo and install the client.

Add the Gremlin Helm chart:

helm repo add gremlin https://helm.gremlin.com

Create a namespace for the Gremlin Kubernetes client:

kubectl create namespace gremlin

Next you will run the helm command to install the Gremlin client. In this command there are three placeholder variables that you will need to replace with real data. Replace $GREMLIN_TEAM_ID with your Team ID from step1, and replace $GREMLIN_TEAM_SECRET with your Secret Key from step 1. Replace $GREMLIN_CLUSTER_ID with a name for the cluster.

If you are using Helm v3, run this command:

helm install gremlin gremlin/gremlin \
    --namespace gremlin \
    --set gremlin.secret.managed=true \
    --set gremlin.secret.type=secret \
    --set gremlin.secret.teamID=$GREMLIN_TEAM_ID \
    --set gremlin.secret.clusterID=$GREMLIN_CLUSTER_ID \
    --set gremlin.secret.teamSecret=$GREMLIN_TEAM_SECRET

For older versions of Helm, use the --name option:

helm install gremlin/gremlin \
    --name gremlin \
    --namespace gremlin \
    --set gremlin.secret.managed=true \
    --set gremlin.secret.type=secret \
    --set gremlin.secret.teamID=$GREMLIN_TEAM_ID \
    --set gremlin.secret.clusterID=$GREMLIN_CLUSTER_ID \
    --set gremlin.secret.teamSecret=$GREMLIN_TEAM_SECRET

If you’re not sure which version of Helm you’re using, run this command:

helm version

For more information on the Gremlin Helm chart, including more configuration options, check out the chart on Github.

Step 4 - Deploy a Microservice Demo Application

The demo environment we are going to deploy on to our EKS cluster is the Hipster Shop: Cloud-Native Microservices Demo Application

On your local machine clone the repo:

git clone https://github.com/GoogleCloudPlatform/microservices-demo.git

Then, change directories to the directory we have just created:

cd microservices-demo

To deploy the application:

kubectl apply -f ./release/kubernetes-manifests.yaml

Wait until pods are in a ready state. To check the readiness run:

kubectl get pods

Grab the ip address the frontend lives on:

kubectl get svc frontend-external -o wide

The output is the URL you’ll visit using your web browser and it looks like this:

A7718c2117c2d11e98240024d0758e34-2062095095.us-west-2.elb.amazonaws.com

Visit the URL on your browser

Step 5 - Run a Shutdown Container Attack using Gremlin

We are going to create our first Chaos Engineering experiment. We want to validate EKS reliability. Our hypothesis is, “When shutting down my cart service container, I will not suffer downtime and EKS will give me a new one.”

Going back to the Gremlin UI, select Attacks from the menu on the left and press the green “New Attack” button. We’re going to target a Kubernetes resource, so click Kubernetes in the upper right.

Gremlin UI

We will be shutting down the “cartservice” containers. Gremlin has imported the objects from Kubernetes and we can see them in the UI. We can find the container we want to target by expanding the Deployments field and selecting cartservice.

We will now go over to choosing the gremlin. We will be a doing a state Chaos Engineering Attack, so select “State” and choose “Shutdown” from the options. We will leave the delay set to 1 minute and turn off the reboot. Then click on the green Unleash Gremlin button.

Unleash Gremlin

You can view the state of the pods with kubetctl:

kubectl get pods

Also, make sure to check out the demo app to test user experience to see if your hypothesis is correct.

Experiment Results

Our hypothesis was, "When shutting down my cart service container, I will not suffer downtime and EKS will give me a new one."

We didn't prove this to be correct. We actually saw that the Hipster Shop: Cloud-Native Microservices Demo Application demo did not gracefully handle shutdown. It instead threw a 500 internal server error. To mitigate this issue we would need to first investigate why we saw the error and look into the logs. For example, we can see the error "could not retrieve cart". When we run kubectl get pods we will see there is only one cartservice running and it has no redundancy.

When we view cartservice.yaml we see that cart service uses redis but it does not use clustered redis: https://github.com/GoogleCloudPlatform/microservices-demo/blob/master/kubernetes-manifests/cartservice.yaml

Conclusion

Congrats! You’ve set up an AWS EKS cluster, deployed the Kubernetes Dashboard, deployed a microservice demo application, installed the Gremlin agent as a daemon set, and ran your first Chaos Engineering attack to validate Kubernetes reliability! If you have any questions at all or are wondering what else you can do with this demo environment, feel free to DM me on the Chaos Slack: @anamedina (join here!).

Related

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Request a Demo