Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. Memcache is general-purpose distributed memory caching system. Datadog is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform. Datadog provides an integration to monitor Memcache.
For the purposes of this tutorial we will run Chaos Engineering experiments for Memcache on Kubernetes. We will use Gremlin to run Chaos Engineering experiments on our cluster where run an IO attack to increase the number of reads. This will give us confidence in the reliability and resiliency of our memcached cluster. Additional experiments that are recommended to run include shutting down Memcache instances and pods and insuring this does not take down your database/storage layer.
To complete this tutorial you will need the following:
You will need to install the following on each of your 4 cloud infrastructure hosts. This will enable you to run your Chaos Engineering experiments.
This tutorial will walk you through the required steps to run the Memcache IO Chaos Engineering experiment.
We will start with creating three Ubuntu 16.04 servers. This will give you four servers to configure. Create 4 hosts and call them kube-01, kube-02, kube-03 and kube-04. You need to be running hosts with a minimum of 4GB RAM.
Set your hostnames for your servers as follows:
Kubernetes will need to assign specialized roles to each server. We will setup one server to act as the master:
On each of the three Ubuntu 16.04 servers run the following commands as root:
1apt-get update && apt-get install -y2apt-transport-httpscurl -s3https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -cat <<EOF >/etc/apt/sources.list.d/kubernetes.listdeb http://apt.kubernetes.io/ kubernetes-xenial mainEOF4apt-get updateapt-get install -y5kubelet kubeadm kubectl docker.io
On the kube-01 node run the following command:
To start using your cluster, you need to run the following as a regular user:
1mkdir -p $HOME/.kubesudo2cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo3chown $(id -u):$(id -g) $HOME/.kube/config
Your Kubernetes master has initialized successfully!
You can now join any number of machines by running the kubeadm join command on each node as root. This command will be created for you as displayed in your terminal for you to copy and run. An example of what this looks like is below:
1kubeadm join --token 702ff6.bc7aacff7aacab17 188.8.131.52:6443 --discovery-token-ca-cert-hash sha256:68bc22d2c631800fd358a6d7e3998e598deb2980ee613b3c2f1da8978960c8ab
When you join your kube-02 and kube-01 nodes you will see the following on the node:
1This node has joined the cluster:* Certificate signing request was sent to master and a response was received.* The Kubelet was informed of the new secure connection details.
To check that all nodes are now joined to the master run the following command on the Kubernetes master kube-01:
1kubectl get nodes
Kubernetes Add-Ons are pods and services that implement cluster features. Pods extend the functionality of Kubernetes. You can install addons for a range of cluster features including Networking and Visualization.
We are going to install the Weave Net Add-On on the kube-01 master which provides networking and network policy. It will continue working on both sides of a network partition and does not require an external database.
Next, you will deploy a pod network to the cluster. The options are listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/
Get the Weave Net yaml:
1curl -o weave.yaml https://cloud.weave.works/k8s/v1.8/net.yaml
Inspect the yaml contents:
On the kube-01 Kubernetes master node run the following commands:
1kubectl apply -f weave.yaml
The result will look like this:
1serviceaccount/weave-net createdclusterrole.rbac.authorization.k8s.io/weave-net createdclusterrolebinding.rbac.authorization.k8s.io/weave-net createdrole.rbac.authorization.k8s.io/weave-net createdrolebinding.rbac.authorization.k8s.io/weave-net createddaemonset.extensions/weave-net created
It may take a minute or two for DNS to be ready. Continue to check for DNS to be ready before moving on by running the following command:
1kubectl get pods --all-namespaces
The successful result will look like this, every container should be running:
1NAMESPACE NAME READY STATUS RESTARTS AGE2kube-system coredns-576cbf47c7-gm6kt 1/1 Running 0 3m20s3kube-system coredns-576cbf47c7-h5v5k 1/1 Running 0 3m20s4kube-system etcd-k8-01 1/1 Running 0 2m14s5kube-system kube-apiserver-k8-01 1/1 Running 0 2m14s6kube-system kube-controller-manager-k8-01 1/1 Running 0 2m18s7kube-system kube-proxy-7m87q 1/1 Running 0 111s8kube-system kube-proxy-mk9h9 1/1 Running 0 113s9kube-system kube-proxy-wkxxm 1/1 Running 0 3m20s10kube-system kube-scheduler-k8-01 1/1 Running 0 2m35s11kube-system weave-net-lvp6x 2/2 Running 0 34s12kube-system weave-net-pjxk2 2/2 Running 0 34s13kube-system weave-net-qrrvl 2/2 Running 0 34s
Congratulations, now your Kubernetes cluster running on Ubuntu 16.04 is up and ready for you to deploy a microservices application.
First download the helm binary on your Kubernetes master, kube-01:
Create a helm directory and unzip the helm binary to your local system:
1mkdir helm-v2.6.0tar zxfv helm-v2.6.0-linux-amd64.tar.gz -C helm-v2.6.0
Add the helm binary’s directory to your PATH environment variable:
1export PATH="$(echo ~)/helm-v2.6.0/linux-amd64:$PATH"
Create a service account with the cluster admin role for Tiller, the Helm server:
1kubectl create serviceaccount --namespace kube-system tiller2kubectl create clusterrolebinding tiller --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
Initialize Tiller in your cluster, and update information of available charts:
1helm init --service-account tiller2helm repo update
You will need to wait until the tiller deploy pod is ready before proceeding. Use the following command to check for when the tiller deploy pod is ready:
1kubectl -n kube-system get pods
You will see the following output:
1NAME READY STATUS RESTARTS AGE2coredns-576cbf47c7-gm6kt 1/1 Running 0 14m3coredns-576cbf47c7-h5v5k 1/1 Running 0 14m4etcd-k8-01 1/1 Running 0 13m5kube-apiserver-k8-01 1/1 Running 0 13m6kube-controller-manager-k8-01 1/1 Running 0 13m7kube-proxy-7m87q 1/1 Running 0 12m8kube-proxy-mk9h9 1/1 Running 0 12m9kube-proxy-wkxxm 1/1 Running 0 14m10kube-scheduler-k8-01 1/1 Running 0 13m11tiller-deploy-9cfccbbcf-6f8j9 1/1 Running 0 93s12weave-net-lvp6x 2/2 Running 0 11m13weave-net-pjxk2 2/2 Running 0 11m14weave-net-qrrvl 2/2 Running 0 11m
Check the logs for the tiller pod, run the following command replacing _tiller-deploy-9cfccbbcf-kflph _with your pod name:
1kubectl logs --namespace kube-system tiller-deploy-9cfccbbcf-kflph
You will see the following output:
1[main] 2018/11/20 20:00:41 Starting Tiller v2.6.0 (tls=false)[main] 2018/11/20 20:00:41 GRPC listening on :44134[main] 2018/11/20 20:00:41 Probes listening on :44135[main] 2018/11/20 20:00:41 Storage driver is ConfigMap
Install a new Memcached Helm chart release with three replicas, one for each node:
1helm install stable/memcached --name mycache --set replicaCount=3
You will see the folllowing output:
1NAME READY STATUS RESTARTS AGE2mycache-memcached-0 1/1 Running 0 89s3mycache-memcached-1 1/1 Running 0 61s4mycache-memcached-2 0/1 Pending 0 48s
Execute the following command to see the running pods:
1kubectl get pods
You should see the following:
1NAME READY STATUS RESTARTS AGE2mycache-memcached-0 1/1 Running 0 3m54s3mycache-memcached-1 1/1 Running 0 3m26s4mycache-memcached-2 0/1 Pending 0 3m13s
First, run the following command to retrieve the endpoints’ IP addresses:
1kubectl get endpoints mycache-memcached
The output should be similar to the following:
1NAME ENDPOINTS AGE2mycache-memcached 10.40.0.1:11211,10.46.0.4:11211 4m10s
Test the deployment by opening a telnet session with one of the running Memcached servers on port 11211:
1kubectl run -it --rm alpine --image=alpine:3.6 --restart=Never telnet mycache-memcached-0.mycache-memcached.default.svc.cluster.local 11211
At the telnet prompt, run these commands using the Memcached ASCII protocol:
1set mykey 0 0 5helloget mykeyquit
The resulting output is shown here in bold:
1If you do not see a command prompt, try pressing enter.set mykey 0 0 5helloSTOREDget mykeyVALUE mykey 0 5helloENDquitConnection closed by foreign host
Next we will implement service discovery logic with Python. Run the following command to create a python pod in your Kubernetes cluster:
1kubectl run -it --rm python --image=python:3.6-alpine --restart=Never sh
Install the pymemcache library:
1pip install pymemcache
You will see the following output
1Collecting pymemcache2Downloading https://files.pythonhosted.org/packages/91/14/f4fb51de1a27b12df6af42e6ff794a13409bdca6c8880e562f7486e78b5b/pymemcache-2.0.0-py2.py3-none-any.whlCollecting six (from pymemcache)3Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl4Installing collected packages: six, pymemcache5Successfully installed pymemcache-2.0.0 six-1.11.0
Start a Python interactive console by running the following command:
In the Python console, run these commands:
1import socket from pymemcache.client.hash import HashClient_, _, ips = socket.gethostbyname_ex('mycache-memcached.default.svc.cluster.local')servers = [(ip, 11211) for ip in ips]client = HashClient(servers, use_pooling=True)client.set('mykey', 'hello')client.get('mykey')
You will see the following output:
Exit the Python console:
Exit the pod’s shell session by pressing Control+D. You will see the following:
1/ # pod "python" deleted
After you have created your Gremlin account you will need to find your Gremlin Daemon credentials. Login to the Gremlin App using your Company name and sign-on credentials. These were emailed to you when you signed up to start using Gremlin. Navigate to Company Teams Settings and click on your Team. Click the blue Download button to get your Team Certificate. The downloaded certificate.zip contains both a public-key certificate and a matching private key.
Unzip the certificate.zip and save it to your gremlin folder on your desktop. Rename your certificate and key files to gremlin.cert and gremlin.key.
Next create your secret as follows:
1kubectl create secret generic gremlin-team-cert --from-file=./gremlin.cert --from-file=./gremlin.key
The simplest way to install the Gremlin client on your Kubernetes cluster is to use Helm. If you do not already have Helm installed, go here to get started. Once Helm is installed and configured, the next steps are to add the Gremlin repo and install the client.
To run the Helm install, you will need your Gremlin Team ID. It can be found in the Gremlin app on the Team Settings page, where you downloaded your certs earlier. Click on the name of your team in the list. The ID you’re looking for is found under Configuration as Team ID.
Export your Team ID as an environment variable:
YOUR_TEAM_ID with the Team ID you obtained from the Gremlin UI.
Next, export your cluster ID, which is just a friendly name for your Kubernetes cluster. It can be whatever you want.
1export GREMLIN_CLUSTER_ID="Your cluster id"
Now add the Gremlin Helm repo, and install Gremlin:
1helm repo add gremlin https://helm.gremlin.com2helm install gremlin/gremlin \3 --namespace gremlin \4 --name gremlin \5 --set gremlin.teamID=$GREMLIN_TEAM_ID \6 --set gremlin.clusterID=$GREMLIN_CLUSTER_ID
For more information on the Gremlin Helm chart, including more configuration options, check out the chart on Github.
To install Datadog in a Kubernetes pod you can use the Datadog Kubernetes easy one-step install. It will take a few minutes for Datadog to spin up the Datadog container, collect metrics on your existing containers and display them in the Datadog App.
You will simple copy the Kubernetes DaemonSet, save it as datadog-agent.yaml and then run the following command:
1kubectl apply -f datadog-agent.yaml
Next install the Memcached Datadog Integration by clicking Install Integration:
You will see that following notification in your event stream:
You can read more about setting up Memcache monitoring in Datadog.
We will use the Gremlin Web App to create an IO attack on the memcache pods. The purpose of this experiment will be to ensure that we are able to identify an increase in IO for our memcache cluster. We will also use this attack to understand how the pod and server handles an increase in IO.
First click Attacks in the left navigation bar and then New Attack. Then click the Kubernetes tab to view all the available Kubernetes objects that you can run Chaos Engineering experiments on.
Scroll down and expand the StatefulSets section, and select memcached.
Next, select the Resource Gremlin and then choose IO. Scroll down and click the Unleash Gremlin button.
You can now monitor your IO attack using Datadog.
There are many Chaos Engineering experiments you could possibly run on your Memcache infrastructure:
This tutorial has explored how to install Memcache and Gremlin with Kubernetes for your Chaos Engineering experiments. We then ran a CPU Chaos Engineering experiment on the Memcache using the Gremlin CPU attack.
Share your results and swap best practices with 5,000+ engineers practicing Chaos Engineering in the Chaos Engineering Slack.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.Get started