Chaos Engineering with Memcache and Kubernetes


Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. Memcache is general-purpose distributed memory caching system. Datadog is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform. Datadog provides an integration to monitor Memcache.

Chaos Engineering Hypothesis

For the purposes of this tutorial we will run Chaos Engineering experiments for Memcache on Kubernetes. We will use Gremlin to run chaos engineering experiments on our cluster where run an IO attack to increase the number of reads. This will give us confidence in the reliability and resiliency of our memcached cluster. Additional experiments that are recommended to run include shutting down Memcache instances and pods and insuring this does not take down your database/storage layer.


To complete this tutorial you will need the following:

  • 4 cloud infrastructure hosts running Ubuntu 16.04 with 4GM RAM and private networking enabled
  • A Gremlin account (sign up here)
  • A Datadog account (sign up here)

You will need to install the following on each of your 4 cloud infrastructure hosts. This will enable you to run your Chaos Engineering experiments.

  • Memcache
  • Kubernetes
  • Helm
  • Docker
  • Gremlin
  • Datadog


This tutorial will walk you through the required steps to run the Memcache IO Chaos Engineering experiment.

  • Step 1 - Creating a Kubernetes cluster with 3 nodes
  • Step 2 - Installing Memcache
  • Step 3 - Installing Helm
  • Step 4 - Installing Gremlin
  • Step 5 - Installing Datadog
  • Step 6 - Performing Chaos Engineering experiments on Memcache
  • Step 7 - Installing mcrouter
  • Step 8 - Performing Chaos Engineering experiments on mcrouter

Step 1 - Creating a Kubernetes cluster with 3 nodes

We will start with creating three Ubuntu 16.04 servers. This will give you four servers to configure. Create 4 hosts and call them kube-01, kube-02, kube-03 and kube-04. You need to be running hosts with a minimum of 4GB RAM.

Set your hostnames for your servers as follows:

  • Server 1 - Hostname: k8-01
  • Server 2 - Hostname: k8-02
  • Server 3 - Hostname: k8-03
  • Server 4 - Hostname: k8-04

Kubernetes will need to assign specialized roles to each server. We will setup one server to act as the master:

  • k8-01 - role: master
  • k8-02 - role: node
  • k8-03 - role: node
  • k8-04 - role: node

Set up each server in the cluster to run Kubernetes

On each of the three Ubuntu 16.04 servers run the following commands as root:

apt-get update && apt-get install -y apt-transport-httpscurl -s | apt-key add -cat <<EOF >/etc/apt/sources.list.d/kubernetes.listdeb kubernetes-xenial mainEOFapt-get updateapt-get install -y kubelet kubeadm kubectl

Setup the Kubernetes Master

On the kube-01 node run the following command:

kubeadm init

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config

Your Kubernetes master has initialized successfully!

Join your nodes to your Kubernetes cluster

You can now join any number of machines by running the kubeadm join command on each node as root. This command will be created for you as displayed in your terminal for you to copy and run. An example of what this looks like is below:

kubeadm join --token 702ff6.bc7aacff7aacab17 --discovery-token-ca-cert-hash sha256:68bc22d2c631800fd358a6d7e3998e598deb2980ee613b3c2f1da8978960c8ab

When you join your kube-02 and kube-01 nodes you will see the following on the node:

This node has joined the cluster:* Certificate signing request was sent to master and a response was received.* The Kubelet was informed of the new secure connection details.

To check that all nodes are now joined to the master run the following command on the Kubernetes master kube-01:

kubectl get nodes

The successful result will look like this:

NAME    STATUS     ROLES    AGE    VERSIONk8-01   NotReady   master   111s   v1.12.2k8-03   NotReady   <none>   14s    v1.12.2k8-04   NotReady   <none>   12s    v1.12.2

Setup a Kubernetes Add-On For Networking Features And Policy

Kubernetes Add-Ons are pods and services that implement cluster features. Pods extend the functionality of Kubernetes. You can install addons for a range of cluster features including Networking and Visualization.

We are going to install the Weave Net Add-On on the kube-01 master which provides networking and network policy. It will continue working on both sides of a network partition and does not require an external database.

Next you will deploy a pod network to the cluster. The options are listed at:

Installing the Weave Net Add-On

Get the Weave Net yaml:

curl -o weave.yaml

Inspect the yaml contents:

cat weave.yaml

On the kube-01 Kubernetes master node run the following commands:

kubectl apply -f weave.yaml

The result will look like this:

serviceaccount/weave-net createddaemonset.extensions/weave-net created

It may take a minute or two for DNS to be ready. Continue to check for DNS to be ready before moving on by running the following command:

kubectl get pods --all-namespaces

The successful result will look like this, every container should be running:

NAMESPACE     NAME                            READY   STATUS    RESTARTS   AGEkube-system   coredns-576cbf47c7-gm6kt        1/1     Running   0          3m20skube-system   coredns-576cbf47c7-h5v5k        1/1     Running   0          3m20skube-system   etcd-k8-01                      1/1     Running   0          2m14skube-system   kube-apiserver-k8-01            1/1     Running   0          2m14skube-system   kube-controller-manager-k8-01   1/1     Running   0          2m18skube-system   kube-proxy-7m87q                1/1     Running   0          111skube-system   kube-proxy-mk9h9                1/1     Running   0          113skube-system   kube-proxy-wkxxm                1/1     Running   0          3m20skube-system   kube-scheduler-k8-01            1/1     Running   0          2m35skube-system   weave-net-lvp6x                 2/2     Running   0          34skube-system   weave-net-pjxk2                 2/2     Running   0          34skube-system   weave-net-qrrvl                 2/2     Running   0          34s

Congratulations, now your Kubernetes cluster running on Ubuntu 16.04 is up and ready for you to deploy a microservices application.

Step 2 - Deploying Memcache

First download the helm binary on your Kubernetes master, kube-01:


Create a helm directory and unzip the helm binary to your local system:

mkdir helm-v2.6.0tar zxfv helm-v2.6.0-linux-amd64.tar.gz -C helm-v2.6.0

Add the helm binary's directory to your PATH environment variable:

export PATH="$(echo ~)/helm-v2.6.0/linux-amd64:$PATH"

Create a service account with the cluster admin role for Tiller, the Helm server:

kubectl create serviceaccount --namespace kube-system tillerkubectl create clusterrolebinding tiller --clusterrole=cluster-admin --serviceaccount=kube-system:tiller

Initialize Tiller in your cluster, and update information of available charts:

helm init --service-account tillerhelm repo update

You will need to wait until the tiller deploy pod is ready before proceeding. Use the following command to check for when the tiller deploy pod is ready:

kubectl -n kube-system get pods

You will see the following output:

NAME                            READY   STATUS    RESTARTS   AGEcoredns-576cbf47c7-gm6kt        1/1     Running   0          14mcoredns-576cbf47c7-h5v5k        1/1     Running   0          14metcd-k8-01                      1/1     Running   0          13mkube-apiserver-k8-01            1/1     Running   0          13mkube-controller-manager-k8-01   1/1     Running   0          13mkube-proxy-7m87q                1/1     Running   0          12mkube-proxy-mk9h9                1/1     Running   0          12mkube-proxy-wkxxm                1/1     Running   0          14mkube-scheduler-k8-01            1/1     Running   0          13mtiller-deploy-9cfccbbcf-6f8j9   1/1     Running   0          93sweave-net-lvp6x                 2/2     Running   0          11mweave-net-pjxk2                 2/2     Running   0          11mweave-net-qrrvl                 2/2     Running   0          11m

Check the logs for the tiller pod, run the following command replacing _tiller-deploy-9cfccbbcf-kflph _with your pod name:

kubectl logs --namespace kube-system tiller-deploy-9cfccbbcf-kflph

You will see the following output:

[main] 2018/11/20 20:00:41 Starting Tiller v2.6.0 (tls=false)[main] 2018/11/20 20:00:41 GRPC listening on :44134[main] 2018/11/20 20:00:41 Probes listening on :44135[main] 2018/11/20 20:00:41 Storage driver is ConfigMap

Install a new Memcached Helm chart release with three replicas, one for each node:

helm install stable/memcached --name mycache --set replicaCount=3

You will see the folllowing output:

NAME                  READY   STATUS    RESTARTS   AGEmycache-memcached-0   1/1     Running   0          89smycache-memcached-1   1/1     Running   0          61smycache-memcached-2   0/1     Pending   0          48s

Execute the following command to see the running pods:

kubectl get pods

You should see the following:

NAME                  READY   STATUS    RESTARTS   AGEmycache-memcached-0   1/1     Running   0          3m54smycache-memcached-1   1/1     Running   0          3m26smycache-memcached-2   0/1     Pending   0          3m13s

Discovering Memcached service endpoints

First, run the following command to retrieve the endpoints' IP addresses:

kubectl get endpoints mycache-memcached

The output should be similar to the following:

NAME                ENDPOINTS                         AGEmycache-memcached,   4m10s

Test the deployment by opening a telnet session with one of the running Memcached servers on port 11211:

kubectl run -it --rm alpine --image=alpine:3.6 --restart=Never telnet mycache-memcached-0.mycache-memcached.default.svc.cluster.local 11211

At the telnet prompt, run these commands using the Memcached ASCII protocol:

set mykey 0 0 5helloget mykeyquit

The resulting output is shown here in bold:

If you don't see a command prompt, try pressing enter.set mykey 0 0 5helloSTOREDget mykeyVALUE mykey 0 5helloENDquitConnection closed by foreign host

Implementing the service discovery logic

Next we will implement service discovery logic with Python. Run the following command to create a python pod in your Kubernetes cluster:

kubectl run -it --rm python --image=python:3.6-alpine --restart=Never sh

Install the pymemcache library:

pip install pymemcache

You will see the following output

Collecting pymemcache  Downloading six (from pymemcache)  Downloading collected packages: six, pymemcacheSuccessfully installed pymemcache-2.0.0 six-1.11.0

Start a Python interactive console by running the following command:


In the Python console, run these commands:

import socketfrom pymemcache.client.hash import HashClient_, _, ips = socket.gethostbyname_ex('mycache-memcached.default.svc.cluster.local')servers = [(ip, 11211) for ip in ips]client = HashClient(servers, use_pooling=True)client.set('mykey', 'hello')client.get('mykey')

You will see the following output:


Exit the Python console:


Exit the pod's shell session by pressing Control+D. You will see the following:

/ # pod "python" deleted

Step 4 – Installing Gremlin for Chaos Engineering experiments

After you have created your Gremlin account (sign up here) you will need to find your Gremlin Daemon credentials. Login to the Gremlin App using your Company name and sign-on credentials. These were emailed to you when you signed up to start using Gremlin.

Navigate to Team Settings and click on your Team. Here you will find your Secret Key, we will be using this key to create your Gremlin daemonset.


Next, create a gremlin.yaml file:

vim gremlin.yaml

Add the following information to your gremlin.yaml file, replacing and with the Team ID and Secret Key in your company settings.

apiVersion: extensions/v1beta1kind: DaemonSetmetadata:  name: gremlin  namespace: default  labels:    k8s-app: gremlin    version: v1spec:  template:    metadata:      labels:        k8s-app: gremlin        version: v1    spec:      # If you want to enable host-level process-killing, add this flag:      #hostPID: true      # If you want to enable host-level network attacks, add this flag:      #hostNetwork: true      containers:      - name: gremlin        image: gremlin/gremlin        args: [ "daemon" ]        imagePullPolicy: Always        securityContext:          capabilities:            add:              - NET_ADMIN              - SYS_BOOT              - SYS_TIME              - KILL        env:          - name: GREMLIN_TEAM_ID            value: 3f242793-018a-5ad5-9211-fb958f8dc084          - name: GREMLIN_TEAM_SECRET            value: ce219f9a-f2d5-4ccc-a19f-9af2d5dcccb2          - name: GREMLIN_IDENTIFIER            valueFrom:              fieldRef:                fieldPath: spec.nodeName        volumeMounts:          - name: docker-sock            mountPath: /var/run/docker.sock          - name: gremlin-state            mountPath: /var/lib/gremlin          - name: gremlin-logs            mountPath: /var/log/gremlin          - name: shutdown-trigger            mountPath: /sysrq      volumes:        # Gremlin uses the Docker socket to discover eligible containers to attack,        # and to launch Gremlin sidecar containers        - name: docker-sock          hostPath:            path: /var/run/docker.sock        # The Gremlin daemon communicates with Gremlin sidecars via its state directory.        # This should be shared with the Kubernetes host        - name: gremlin-state          hostPath:            path: /var/lib/gremlin        # The Gremlin daemon forwards logs from the Gremlin sidecars to the Gremlin control plane        # These logs should be shared with the host        - name: gremlin-logs          hostPath:            path: /var/log/gremlin        # If you want to run shutdown attacks on the host, the Gremlin Daemon requires a /proc/sysrq-trigger:/sysrq mount        - name: shutdown-trigger          hostPath:            path: /proc/sysrq-trigger

Save this file and run the following command to create the daemonset:

kubectl apply -f daemonset.yaml

You will see the following result:

daemonset "gremlin" created

Next verify that the Gremlin daemonset has been created and the pods were created successfully:

kubectl get pods --namespace default

You will see the following:

NAME                  READY   STATUS    RESTARTS   AGEgremlin-6brtp         1/1     Running   0          49sgremlin-t5d2h         1/1     Running   0          49smycache-memcached-0   1/1     Running   0          15mmycache-memcached-1   1/1     Running   0          14mmycache-memcached-2   0/1     Pending   0          14m

Step 4 – Installing the Datadog agent using a Kubernetes Daemonset

To install Datadog in a Kubernetes pod you can use the Datadog Kubernetes easy one-step install. It will take a few minutes for Datadog to spin up the Datadog container, collect metrics on your existing containers and display them in the Datadog App.

Datadog API

You will simple copy the Kubernetes DaemonSet, save it as datadog-agent.yaml and then run the following command:

kubectl apply -f datadog-agent.yaml

Next install the Memcached Datadog Integration by clicking Install Integration:

Datadog integration memcache

You will see that following notification in your event stream:

memcache integration installed

You can read more about setting up Memcache monitoring in Datadog.

Step 5 - Chaos Engineering experiments for Memcache with Gremlin

We will use the Gremlin Web App to create an IO attack on a specific memcache pod. The purpose of this experiment will be to ensure that we are able to identify an increase in IO for our memcache cluster. We will also use this attack to understand how the pod and server handles an increase in IO.

First click to create a new attack. Then click the container tab to view all the available containers you can run Chaos Engineering experiments on.

Select the _mycache-memcached-0 pods:

memcache 0

Next, select the Resource Gremlin and then choose IO. Click to unleash the Gremlin.

IO Gremlin

You can now monitor your IO attack using Datadog.

IO Attack Datadog

Step 9 – Additional Chaos Engineering experiments to run on Memcache

There are many Chaos Engineering experiments you could possibly run on your Memcache infrastructure:

  • Shutdown Gremlin - will shutting down a memcache node cause unexpected issues?
  • Latency & Packet Loss Gremlins - will they impact the ability to use the Memcache API endpoints?
  • **Disk Gremlin **- will filling up the disk crash the host? We encourage you to run these Chaos Engineering experiments and share your findings! To get access to Gremlin, sign up here.


This tutorial has explored how to install Memcache and Gremlin with Kubernetes for your Chaos Engineering experiments. We then ran a CPU Chaos Engineering experiment on the Memcache using the Gremlin CPU attack.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Try Gremlin for free and see how you can harness chaos to build resilient systems.