How to Install and use Gremlin with EKS

How to Install and use Gremlin with EKS

Introduction

Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. This tutorial will walk through how to install Gremlin on Amazon’s Managed Kubernetes Service (EKS) with a demo environment and perform a Chaos Engineering experiment using a Gremlin Shutdown attack.

Prerequisites

Before you begin this tutorial, you’ll need the following:

Overview

This tutorial will walk you through the required steps to run an EKS cluster, deploy two applications and then run a Chaos Engineering experiment using Gremlin.

  • Step 0 - Verify your account AWS CLI Installation
  • Step 1 - Create an EKS cluster using eksctl
  • Step 2 - Load up the kubeconfig for the cluster
  • Step 3 - Install Gremlin using the Kubernetes Dashboard
  • Step 4 - Deploy a Microservice Demo Application
  • Step 5 - Run a Shutdown Container Attack using Gremlin

Step 0 - Verify your account AWS CLI Installation

In this step, you’ll first verify that you have your AWS CLI configured to use eksctl to create the EKS cluster:

 aws --version

This should give you an output similar to:

aws-cli/1.16.150 Python/3.7.3 Darwin/18.5.0 botocore/1.12.140

If you’re having issues, refer back to the AWS CLI Installation documentation.

Step 1 - Create an EKS cluster using eksctl

For this tutorial, we are going to use Weave Work’s open source tool, eksctl, to create our EKS clusters. On your local machine, install eksctl:

curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest\_release/eksctl\_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp sudo mv /tmp/eksctl /usr/local/bin

After installing eksctl, create a basic cluster:

eksctl create cluster

This will create a cluster and the needed resources in us-west-2. It will auto-generate a cluster name, create 2 m5.large ec2 instances using the official AWS EKS AMI, and set up a dedicated VPC.

Step 2 - Load up the kubeconfig for the cluster

Verify that the eks cluster has been set up properly:

eksctl get clusters

The output should display the name of your cluster and the region similar to:

NAME		REGION

gremlin-eks	fabulous-mushroom-1527688624

You can now grab the kubeconfig file from AWS using the AWS CLI and passing the cluster name and region:

sudo aws eks --region us-west-2  update-kubeconfig --name fabulous-mushroom-1527688624

To averify the hosts that eksctl has setup for us, run:

kubectl get nodes

Step 2 - Deploy Kubernetes Dashboard

We now want to deploy the Kubernetes dashboard, heapster and influxdb.

To deploy the dashboard to your EKS cluster:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml

To deploy heapster:

 kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml

To deploy influxdb:

 kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml

Now create the heapster cluster role binding for the dashboard and cluster role binding.

kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml

We now want to create an eks-admin service account, this will let you connect to the kubernetes dashboard with admin permissions. To authenticate and use the Kubernetes dashboard:

kubectl apply -f https://raw.githubusercontent.com/tammybutow/eks-aws/master/eks-admin-service-account.yaml

To connect to the Kubernetes dashboard, first, authentication token for the eks-admin-service account:

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')

In your local machine deploy the Kubernetes dashboard:

kubectl proxy

On a web browser, access the dashboard by visiting this URL.

http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login

To sign in, select token and use the output that the previous step gave us

Step 3 - Install Gremlin using the Kubernetes Dashboard

First, we need to create a new namespace and daemonset to deploy Gremlin.

On the left-hand side, we will select “Namespaces” and then on the top right corner, press “Create”. This will display an empty box, paste the configuration for the gremlin namespace.

{
  'apiVersion': 'v1',
  'kind': 'Namespace',
  'metadata': { 'name': 'gremlin', 'labels': { 'name': 'gremlin' } },
}

On the left-hand side, change the dropdown that says “Namespace” to “All Namespaces” and select “Daemon Sets” from the list.

Now on the top right corner, press “Create”.

Before being able to deploy the Gremlin daemon set, we need to grab the credentials.

They can be found by logging in to the Gremlin App using your Company name and sign-on credentials. (These were emailed to you when you signed up to start using Gremlin.) Click on the right corner circular avatar, selecting “Company Settings”, followed by selecting the Team. The ID you’re looking for is found under Configuration as “Team ID”, and press the key you need is found under “Secret Key”, press “Create Key” or “Reset’ to generate your secret key to login.

Now, going back to the Kubernetes Dashboard, paste this into the text box, while substituting <team-id> and <secret-key> with the Team ID and Secret Key that correspond to your team.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: gremlin
  namespace: gremlin
  labels:
    k8s-app: gremlin
    version: v1
spec:
  template:
    metadata:
      labels:
        k8s-app: gremlin
        version: v1
    spec:
      containers:
        - name: gremlin
          image: gremlin/gremlin
          args: ['daemon']
          imagePullPolicy: Always
          securityContext:
            capabilities:
              add:
                - NET_ADMIN
                - SYS_BOOT
                - SYS_TIME
                - KILL
          env:
            - name: GREMLIN_ORG_ID
              value: <team-id>
            - name: GREMLIN_ORG_SECRET
              value: <secret-key>
            - name: GREMLIN_IDENTIFIER
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: docker-sock
              mountPath: /var/run/docker.sock
      volumes:
        - name: docker-sock
          hostPath:
            path: /var/run/docker.sock

Step 4 - Deploy a Microservice Demo Application

The demo environment we are going to deploy on to our EKS cluster is the Hipster Shop: Cloud-Native Microservices Demo Application

On your local machine clone the repo:

git clone https://github.com/GoogleCloudPlatform/microservices-demo.git

Then, change directories to the directory we have just created:

cd microservices-demo

To deploy the application:

kubectl apply -f ./release/kubernetes-manifests.yaml

Wait until pods are in a ready state. To check the readiness run:

kubectl get pods

Grab the ip address the frontend lives on:

kubectl get svc -o jsonpath='{.items\[?(@.metadata.name == "frontend-external")].status.loadBalancer.ingress\[0].hostname}'

The output is the URL you’ll visit using your web browser and it looks like this:

A7718c2117c2d11e98240024d0758e34-2062095095.us-west-2.elb.amazonaws.com

Visit the URL on your browser

Step 5 - Run a Shutdown Container Attack using Gremlin

We are going to create our first Chaos Engineering experiment. We want to validate EKS reliability. Our hypothesis is, “When shutting down my cart service container, I will not suffer downtime and EKS will give me a new one.”

Going back to the Gremlin UI, select Attacks from the menu on the left and press the green “New Attack” button. This will be a container attack so we will select “Containers” to change the options.

We will be shutting down the “cartservice” container. Gremlin has imported the tags from Kubernetes and we can see them in the UI. We can find the container we want to target by typing “app:cartservice” on the search bar, then selecting the one container that has service.

We will now go over to choosing the gremlin. We will be a doing a state Chaos Engineering Attack, select “State” and choose “Shutdown” from the options. We will make the delay be 0 and turn off the reboot.

Head back over to the kubernetes dashboard, and select pods on the left menu bar to display the pod’s state. Also, make sure to check out the demo app to test user experience to see if your hypothesis is correct.

Experiment Results

Our hypothesis was, "When shutting down my cart service container, I will not suffer downtime and EKS will give me a new one."

We didn't prove this to be correct. We actually saw that the Hipster Shop: Cloud-Native Microservices Demo Application demo did not gracefully handle shutdown. It instead threw a 500 internal server error. To mitigate this issue we would need to first investigate why we saw the error and look into the logs. For example, we can see the error "could not retrieve cart". When we run kubectl get pods we will see there is only one cartservice running and it has no redundancy.

When we view cartservice.yaml we see that cart service uses redis but it does not use clustered redis: https://github.com/GoogleCloudPlatform/microservices-demo/blob/master/kubernetes-manifests/cartservice.yaml

Conclusion

Congrats! You’ve set up an AWS EKS cluster, deployed the Kubernetes Dashboard, deployed a microservice demo application, installed the Gremlin agent as a daemon set, and ran your first Chaos Engineering attack to validate Kubernetes reliability! If you have any questions at all or are wondering what else you can do with this demo environment, feel free to DM me on the Chaos Slack: @anamedina (join here!).

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Use Gremlin for Free and see how you can harness chaos to build resilient systems.

Use For Free