Chaos Monkey Alternatives

Kubernetes

5 min read
Last Updated October 17, 2018

Kube Monkey

Kube-monkey is an open-source implementation of Chaos Monkey for use on Kubernetes clusters and written in Go. Like the original Chaos Monkey, Kube-monkey performs just one task: it randomly deletes Kubernetes pods within the cluster, as a means of injecting failure in the system and testing the stability of the remaining pods. It is based on pseudo-random rules, running at a pre-defined hour on weekdays to then build a schedule. Based on the generated schedule random pod targets that will be attacked and killed at a random time during that same day, although the time-range is configurable.

Kube-monkey will only terminate pods that have explicitly opted in by specifying certain Kube-monkey metadata labels. The following illustrates the basic labels that can be specified to allow Kube-monkey to kill pods within the application.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: monkey-victim
  namespace: app-namespace
  labels:
    kube-monkey/enabled: enabled
    kube-monkey/identifier: monkey-victim
    kube-monkey/mtbf: '2'
    kube-monkey/kill-mode: "fixed"
    kube-monkey/kill-value: 1
spec:
  template:
    metadata:
      labels:
        kube-monkey/enabled: enabled
        kube-monkey/identifier: monkey-victim
# ...

Check out the GitHub repository for more information on installing and using Kube-monkey.

Engineering Chaos In Kubernetes with Gremlin

Gremlin's Failure as a Service simplifies your Chaos Engineering workflow for Kubernetes by making it safe and effortless to execute Chaos Experiments across all nodes. As a distributed architecture Kubernetes is particularly sensitive to instability and unexpected failures. Gremlin can perform a variety of attacks on your Kubernetes clusters, including overloading CPU, memory, disk, and IO; killing nodes; modifying network traffic; and much more.

Check out this tutorial over on our community site to get started!

Kubernetes Pod Chaos Monkey

Kubernetes Pod Chaos Monkey is a Chaos Monkey-style tool for Kubernetes. The code itself is a local shell script that issues kubectl commands to occasionally locate and then delete Kubernetes pods. It targets a cluster based on the configurable NAMESPACE and attempts to destroy a node every DELAY seconds (defaulting to 30).

Since Kubernetes Pod Chaos Monkey is essentially a simple shell script it can be modified quite easily.

The Chaos Toolkit

The Chaos Toolkit is an open-source and extensible tool that is written in Python. It uses platform-specific drivers to connect to your Kubernetes cluster and execute Chaos Experiments. Every experiment performed by Chaos Toolkit is written in JSON using a robust API. Experiments are made up of a few key elements that are executed sequentially and allow the experiment to bail out if any step in the process fails.

  • Steady State Hypothesis: This element defines the normal or "steady" state of the system before the Method element is applied. Here we've defined a basic application with a steady state hypothesis titled "Service should have nodes."

    {
      "version": "1.0.0",
      "title": "Gremlin EKS App",
      "description": "Gremlin EKS App",
      "tags": [
          "service",
          "kubernetes"
      ],
      "steady-state-hypothesis": {
          "title": "Service should have nodes.",
          "probes": [
              {
                  "type": "probe",
                  "name": "nodes_found",
                  "tolerance": true,
                  "provider": {
                      "type": "python",
                      "module": "chaosk8s.node.probes",
                      "func": "get_nodes",
                      "arguments": {
                          "label_selector": "eks-gremlin-chaos"
                      }
                  }
              }
          ]
      },
    }
  • Probe: A Probe is an element that collects system information, such as checking the health status of a node. Here we define a Probe element, which we've added to our steady state Probes list above, that calls the get_nodes function and retrieves the list of nodes for the specified label-selector.

    {
        "type": "probe",
        "name": "nodes_found",
        "tolerance": true,
        "provider": {
            "type": "python",
            "module": "chaosk8s.node.probes",
            "func": "get_nodes",
            "arguments": {
                "label_selector": "eks-gremlin-chaos"
            }
        }
    }
  • Action: An Action element performs an operation against the system, such as draining or deleting a node. In the example we call the delete_nodes function, passing the required label-selector argument, and setting all to true so we delete all nodes in the cluster.

    {
        "type": "action",
        "name": "delete_all_nodes",
        "provider": {
            "type": "python",
            "module": "chaosk8s.node.actions",
            "func": "delete_nodes",
            "arguments": {
                "all": true,
                "label-selector": "eks-gremlin-chaos"
            }
        }
    }
  • Method: A Method element defines the series of Probe and Action elements that make up the experiment. Here we're first using the nodes_found Probe to make sure nodes exist, executing the delete_all_nodes Action to delete all nodes in the cluster, then performing another explicit Probe to verify that no nodes remain.

    "method": [
        {
            "ref": "nodes_found"
        },
        {
            "type": "action",
            "name": "delete_all_nodes",
            "provider": {
                "type": "python",
                "module": "chaosk8s.node.actions",
                "func": "delete_nodes",
                "arguments": {
                    "all": true,
                    "label-selector": "eks-gremlin-chaos"
                }
            }
        },
        {
            "type": "probe",
            "name": "nodes_not_found",
            "tolerance": false,
            "provider": {
                "type": "python",
                "module": "chaosk8s.node.probes",
                "func": "get_nodes",
                "arguments": {
                    "label_selector": "eks-gremlin-chaos"
                }
            }
        }
    ]

That's the basics to begin experimenting using the Chaos Toolkit. Chaos Toolkit also has a fault injection plugin for Gremlin so you can easily perform attacks while utilizing the safety and security of the Gremlin platform.