Chaos Monkey Tutorial

A Step-by-Step Guide to Creating Failure on AWS

  • 11 min read
  • Last Updated October 17, 2018

This chapter will provide a step-by-step guide for setting up and using Chaos Monkey with AWS. We also examine the scenarios where Chaos Monkey is the right solution, and its limitations since it only handles random instance terminations.

How to Quickly Deploy Spinnaker for Chaos Monkey

Modern Chaos Monkey requires the use of Spinnaker, which is an open-source, multi-cloud continuous delivery platform developed by Netflix. Spinnaker allows for automated deployments across multiple cloud platforms (such as AWS, Azure, Google Cloud Platform, and more). Spinnaker can also be used to deploy across multiple accounts and regions, often using pipelines that define a series of events that should occur every time a new version is released. Spinnaker is a powerful tool, but since both Spinnaker and Chaos Monkey were developed by and for Netflix's own architecture, you'll need to do the extra legwork to configure Spinnaker to work within your application and infrastructure.

In this first section we'll explore the fastest and simplest way to get Spinnaker up and running, which will then allow you to move onto installing and then using.

We'll be deploying Spinnaker on AWS, and the easiest method for doing so is to use the CloudFormation Quick Start template.

The AWS Spinnaker Quick Start will create a simple architecture for you containing two subnets (one public and one private) in a Virtual Private Cloud (VPC). The public subnet contains a Bastion host instance designed to be strictly accessible, with just port 22 open for SSH access. The Bastion host will then allow a pass through connection to the private subnet that is running Spinnaker.


*AWS Spinnaker Quick Start Architecture - Courtesy of AWS*

This quick start process will take about 10 - 15 minutes and is mostly automated.

Creating the Spinnaker Stack

  • (Optional) If necessary, visit to sign up for or login to your AWS account.

  • (Optional) You'll need at least one AWS EC2 Key Pair for securely connecting via SSH.

    • If you don't have a KeyPair already start by opening the AWS Console and navigate to EC2 > NETWORK & SECURITY > Key Pairs.
    • Click Create Key Pair and enter an identifying name in the Key pair name field.
    • Click Create to download the private .pem key file to your local system.
    • Save this key to an appropriate location (typically your local user ~/.ssh directory).
  • After you've signed into the AWS console visit this page, which should contain a link to the quickstart-spinnakercf.template.

  • Click Next.

  • (Optional) If you haven't already done so, you'll need to create at least one AWS Access Key.

  • Select the KeyName of the key pair you previously created.

  • Input a secure password in the Password field.

  • (Optional) Modify the IP address range in the SSHLocation field to indicate what IP range is allowed to SSH into the Bastion host. For example, if your public IP address is you might enter into this field. If you aren't sure, you can enter to allow any IP address to connect, though this is obviously less secure.

  • Click Next.

  • (Optional) Select an IAM Role with proper CloudFormation permissions necessary to deploy a stack. If you aren't sure, leave this blank and deployment will use your account's permissions.

  • Modify any other fields on this screen you wish, then click Next to continue.

  • Check the I acknowledge that AWS CloudFormation might create IAM resources with custom names. checkbox and click Create to generate the stack.

  • Once the Spinnaker stack has a CREATE_COMPLETE Status, select the Outputs tab, which has some auto-generated strings you'll need to paste in your terminal in the next section.

Connecting to the Bastion Host

  • Copy the Value of the SSHString1 field from the stack Outputs tab above.

  • Execute the SSHString1 value in your terminal and enter yes when prompted to continue connecting to this host.

    1ssh -A -L 9000:localhost:9000 -L 8084:localhost:8084 -L 8087:localhost:8087 ec2-user@
  • You should now be connected as the ec2-user to the Bastion instance. Before you can connect to the Spinnaker instance you'll probably need to copy your .pem file to the Spinnaker instance's ~/.ssh directory.

    • Once the key is copied, make sure you set proper permissions otherwise SSH will complain.

      1chmod 400 ~/.ssh/my_key.pem

Connecting to the Spinnaker Host

  • To connect to the Spinnaker instance copy and paste the SSHString2 Value into the terminal.

    1ssh –L 9000:localhost:9000 -L 8084:localhost:8084 -L 8087:localhost:8087 ubuntu@ -i ~/.ssh/my_key.pem
  • You should now be connected to the SpinnakerWebServer!

Configuring Spinnaker

The Spinnaker architecture is composed of a collection of microservices that each handle various aspects of the entire service. For example, Deck is the web interface you'll spend most time interacting with, Gate is the API gateway that handles most communication between microservices, and CloudDriver is the service that communicates and configures all cloud providers Spinnaker is working with.

Since so much of Spinnaker is blown out into smaller microservices, configuring Spinnaker can require messing with a few different files. If there's an issue you'll likely have to look through individual logs for each different service, depending on the problem.

Spinnaker is configured through /opt/spinnaker/config/spinnaker.yml file. However, this file will be overwritten by Halyard or other changes, so for user-generated configuration you should actually modify the /opt/spinnaker/config/spinnaker-local.yml file. Here's a basic example of what that file looks like.

1# /opt/spinnaker/config/spinnaker-local.yml
4 spinnaker:
5 timezone: 'America/Los_Angeles'
8 aws:
9 # For more information on configuring Amazon Web Services (aws), see
10 #
12 enabled: ${SPINNAKER_AWS_ENABLED:false}
13 defaultRegion: ${SPINNAKER_AWS_DEFAULT_REGION:us-west-2}
14 defaultIAMRole: Spinnaker-BaseIAMRole-GAT2AISI7TMJ
15 primaryCredentials:
16 name: default
17 # Store actual credentials in $HOME/.aws/credentials. See spinnaker.yml
18 # for more information, including alternatives.
20 # {{name}} will be interpolated with the aws account name (e.g. "my-aws-account-keypair").
21 defaultKeyPairTemplate: '{{name}}-keypair'
22 # ...

Standalone Spinnaker installations (such as the one created via the AWS Spinnaker Quick Start) are configured directly through the spinnaker.yml and spinnaker-local.yml override configuration files.

Creating an Application

In this section we'll manually create a Spinnaker application containing a pipeline that first bakes a virtual machine image and then deploys that image to a cluster.

  • Open the Spinnaker web UI (Deck) and click Actions > Create Application.
  • Input bookstore in the Name field.
  • Input your own email address in the Owner Email field.
  • (Optional) If you've enabled Chaos Monkey in Spinnaker you can opt to enable Chaos Monkey by checking the Chaos Monkey > Enabled box.
  • Input My bookstore application in the Description field.
  • Under Instance Health, tick the Consider only cloud provider health when executing tasks checkbox.
  • Click Create to add your new application.


Adding a Firewall

  • Navigate to the bookstore application, INFRASTRUCTURE > FIREWALLS, and click Create Firewall.
  • Input dev in the Detail field.
  • Input Bookstore dev environment in the Description field.
  • Within the VPC dropdown select SpinnakerVPC.
  • Under the Ingress header click Add new Firewall Rule. Set the following Firewall Rule settings.
    • Firewall: default
    • Protocol: TCP
    • Start Port: 80
    • End Port: 80
  • Click the Create button to finalize the firewall settings.


Adding a Load Balancer

  • Navigate to the bookstore application, INFRASTRUCTURE > LOAD BALANCERS, and click Create Load Balancer.
  • Select Classic (Legacy) and click Configure Load Balancer.
  • Input test in the Stack field.
  • In the VPC Subnet dropdown select internal (vpc-...).
  • In the Firewalls dropdown select bookstore--dev (...).
  • Click Create to generate the load balancer.


Creating a Pipeline in Spinnaker

The final step is to add a pipeline, which is where we tell Spinnaker what it should actually "do"! In this case we'll tell it to bake a virtual machine image containing Redis, then to deploy that image to our waiting EC2 instance.

  • Navigate to the bookstore application, PIPELINES and click Create Pipeline.
  • Select Pipeline in the Type dropdown.
  • Input Bookstore Dev to Test in the Pipeline Name field.
  • Click Create.

Adding a Bake Stage

  • Click the Add stage button.
  • Under Type select Bake.
  • Input redis-server in the Package field.
  • Select trusty (v14.04) in the Base OS dropdown.
  • Click Save Changes to finalize the stage.


Adding a Deploy Stage

  • Click the Add stage button.
  • Under Type select Deploy.
  • Click the Add server group button to begin creating a new server group.

Adding a Server Group

  • Select internal (vpc-...) in the VPC Subnet dropdown.

  • Input dev in the Stack field.

  • Under Load Balancers > Classic Load Balancers select the bookstore-dev load balancer we created.

  • Under Firewalls > Firewalls select the bookstore--dev firewall we also created.

  • Under Instance Type select the Custom Type of instance you think you'll need. For this example we'll go with something small and cheap, such as t2.large.

  • Input 3 in the Capacity > Number of Instances field.

  • Under Advanced Settings > Key Name select the key pair name you used when deploying your Spinnaker CloudFormation stack.

  • In the Advanced Settings > IAM Instance Profile field input the Instance Profile ARN value of the BaseIAMRole found in the AWS > IAM > Roles > BaseIAMRole dialog (e.g. arn:aws:iam::0123456789012:instance-profile/BaseIAMRole).

  • We also need to ensure the user/Spinnaker-SpinnakerUser that was generated has permissions to perform to pass the role/BasIAMRole role during deployment.

    • Navigate to AWS > IAM > Users > Spinnaker-SpinnakerUser-### > Permissions.

    • Expand Spinnakerpassrole policy and click Edit Policy.

    • Select the JSON tab and you'll see the auto-generated Spinnaker-BaseIAMRole listed in Resources.

    • Convert the Resource key value to an array so you can add a second value. Insert the ARN for the role/BaseIAMRole of your account (the account number will match the number above).

      2 "Version": "2012-10-17",
      3 "Statement": [
      4 {
      5 "Sid": "VisualEditor0",
      6 "Effect": "Allow",
      7 "Action": "iam:PassRole",
      8 "Resource": [
      9 "arn:aws:iam::123456789012:role/Spinnaker-BaseIAMRole-6D9LJ9HS4PZ7",
      10 "arn:aws:iam::123456789012:role/BaseIAMRole"
      11 ]
      12 }
      13 ]
    • Click Review Policy and Save Changes.

  • Click the Add button to create the deployment cluster configuration.

  • Finally, click Save Changes again at the bottom of the Pipelines interface to save the full Configuration > Bake > Deploy pipeline.

  • You should now have a Bookstore Dev to Test two-stage pipeline ready to go!


Executing the Pipeline

  • Navigate to the bookstore application, select Pipelines, and click Start Manual Execution next to the Bookstore Dev to Test pipeline.
  • Click Run to begin manual execution.
  • After waiting a few moments, assuming none of the potential setbacks below bite you, you'll shortly see output indicating the bookstore-dev-v000 Server Group has been successfully created. Browse over to AWS > EC2 and you'll see your three new instances launched!

To resize this Server Group use the Resize Server Group dialog in Spinnaker. Alternatively, you can find additional options under Server Group Actions, such as Disable or Destroy to stop or terminate instances, respectively.

Troubleshooting Pipeline Executions

While a lot can go wrong, below are a few potential issues you may encounter running through this tutorial, depending on changes to software versions, default configurations, and the like between present and time of writing.

1sudo nano /opt/rosco/config/packer/aws-ebs.json
1{ "builders": { "aws_ena_support": "{% raw %}{{user `aws_ena_support`}}{% endraw %}", }, }
1sudo nano /opt/rosco/config/packer/aws-ebs.json
1{ "builders": { "spot_price": "{% raw %}{{user `aws_spot_price`}}{% endraw %}", "spot_price_auto_product": "{% raw %}{{user `aws_spot_price_auto_product`}}{% endraw %}", }, }
1sudo curl --output /opt/rosco/config/packer/

How to Install Chaos Monkey

Before you can use Chaos Monkey you'll need to have Spinnaker deployed and running. We've created a handful of step-by-step tutorials for deploying Spinnaker, depending on the environment and level of control you're looking for.

Installing MySQL

Chaos Monkey requires MySQL, so make sure it's installed on your local system.

  • Download the latest mysql-apt.deb file from the official website, which we'll use to install MySQL

    1curl -OL
  • Install mysql-server by using the dpkg command.

    1sudo dpkg -i mysql-apt-config_0.8.10-1_all.deb
  • In the UI that appears press enter to change the MySQL Server & Cluster version to mysql-5.7. Leave the other options as default and move down to Ok and press Enter to finalize your choice.


  • Now use sudo apt-get update to update the MySQL packages related to the version we selected (mysql-5.7, in this case).

    1sudo apt-get update
  • Install mysql-server from the packages we just retrieved. You'll be prompted to enter a root password.

    1sudo apt-get install mysql-server
  • You're all set. Check that MySQL server is running with systemctl.

    1systemctl status mysql
  • (Optional) You may also wish to issue the mysql_secure_installation command, which will walk you through a few security-related prompts. Typically, the defaults are just fine.

Setup MySQL for Chaos Monkey

We now need to add a MySQL table for Chaos Monkey to use and create an associated user with appropriate permissions.

  • Launch the mysql CLI as the root user.

    1mysql -u root -p
  • Create a chaosmonkey database for Chaos Monkey to use.

    1CREATE DATABASE chaosmonkey;
  • Add a chaosmonkey MySQL user.

    1CREATE USER 'chaosmonkey'@'localhost' IDENTIFIED BY 'password';
  • Grant all privileges in the chaosmonkey database to the new chaosmonkey user.

    1GRANT ALL PRIVILEGES ON chaosmonkey.* TO 'chaosmonkey'@'localhost';
  • Finally, save all changes made to the system.


Installing Chaos Monkey

  • (Optional) Install go if you don't have it on your local machine already.

    • Go to this download page and download the latest binary appropriate to your environment.

      1curl -O
    • Extract the archive to the /usr/local directory.

      1sudo tar -C /usr/local -xzf go1.11.linux-amd64.tar.gz
    • Add /usr/local/go/bin to your $PATH environment variable.

      1export PATH=$PATH:/usr/local/go/bin
      2echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
  • (Optional) Check if the $GOPATH and $GOBIN variables are set with echo $GOPATH and echo $GOBIN. If not, export them and add them to your bash profile.

    1export GOPATH=$HOME/go
    2echo 'export GOPATH=$HOME/go' >> ~/.bashrc
    3export GOBIN=$HOME/go/bin
    4echo 'export GOBIN=$HOME/go/bin' >> ~/.bashrc
    5export PATH=$PATH:$GOBIN
    6echo 'export PATH=$PATH:$GOBIN' >> ~/.bashrc
  • Install the latest Chaos Monkey binary.

    1go get

Configure Spinnaker for Chaos Monkey

Spinnaker includes the Chaos Monkey feature as an option, but it is disabled by default.

  • (Optional) If necessary, enable the Chaos Monkey feature in your Spinnaker deployment.

    • On a Halyard-based Spinnaker deployment you must enable the Chaos Monkey feature via the Halyard --chaos flag.
      1hal config features edit --chaos true
    • On a quick start Spinnaker deployment you'll need to manually enable the Chaos Monkey feature flag within the /opt/deck/html/settings.js file. Make sure the var chaosEnabled is set to true, then save and reload Spinnaker.
      1sudo nano /opt/deck/html/settings.js
      1// var chaosEnabled = ${services.chaos.enabled};
      2var chaosEnabled = true
  • Navigate to Applications > (APPLICATION_NAME) > CONFIG and select CHAOS MONKEY in the side navigation.


  • Check the Enabled box to enable Chaos Monkey.

  • The UI provides useful information for what every option does, but the most important options are the mean and min times between instance termination. If your setup includes multiple clusters or stacks, altering the grouping may also make sense. Finally, you can add exceptions as necessary, which acts as a kind of exclude list of instances that will be ignored by Chaos Monkey, so you can keep the most critical services up and running.

  • Once your changes are made click the Save Changes button.

How to Configure Chaos Monkey

  • Start by creating the chaosmonkey.toml, which Chaos Monkey will try to find in all of the following locations, until a configuration file is found:

    • (current directory)
    • /apps/chaosmonkey
    • /etc
    • /etc/chaosmonkey
  • Add the following basic configuration structure to your chaosmonkey.toml file, replacing appropriate <DATABASE_> configuration values with your own settings.

    2enabled = true
    3schedule_enabled = true
    4leashed = false
    5accounts = ["aws-primary"]
    7start_hour = 9 # time during day when starts terminating
    8end_hour = 15 # time during day when stops terminating
    10# location of command Chaos Monkey uses for doing terminations
    11term_path = "/apps/chaosmonkey/"
    13# cron file that Chaos Monkey writes to each day for scheduling kills
    14cron_path = "/etc/cron.d/chaosmonkey-schedule"
    17host = "localhost"
    18name = "<DATABASE_NAME>"
    19user = "<DATABASE_USER>"
    20encrypted_password = "<DATABASE_USER_PASSWORD>"
    23endpoint = "http://localhost:8084"
  • With Chaos Monkey configured it's time to migrate it to the MySQL

    1chaosmonkey migrate
    2[16264] 2018/09/04 14:11:16 Successfully applied database migrations. Number of migrations applied: 1
    3[16264] 2018/09/04 14:11:16 database migration applied successfully
1mysql_tzinfo_to_sql /usr/share/zoneinfo/ | mysql -u root mysql -p

How to Use Chaos Monkey

Using the chaosmonkey command line tool is fairly simple. Start by making sure it can connect to your spinnaker instance with chaosmonkey config spinnaker.

1chaosmonkey config spinnaker
3 Enabled: (bool) true,
4 RegionsAreIndependent: (bool) true,
5 MeanTimeBetweenKillsInWorkDays: (int) 2,
6 MinTimeBetweenKillsInWorkDays: (int) 1,
7 Grouping: (chaosmonkey.Group) cluster,
8 Exceptions: ([]chaosmonkey.Exception) {
9 },
10 Whitelist: (*[]chaosmonkey.Exception)(<nil>)
1kubectl get nodes --watch
2# OUTPUT Ready <none> 3d v1.10.3 | Ready <none> 3d v1.10.3 | Ready <none> 3d v1.10.3

To manually terminate an instance with Chaos Monkey use the chaosmonkey terminate command.

1chaosmonkey terminate <app> <account> [--region=<region>] [--stack=<stack>] [--cluster=<cluster>] [--leashed]

For this example our application is spinnaker and our account is aws-primary, so using just those two values and leaving the rest default should work.

1chaosmonkey terminate spinnaker aws-primary
2[11533] 2018/09/08 18:39:26 Picked: {spinnaker aws-primary us-west-2 eks spinnaker-eks-nodes-NodeGroup-KLBYTZDP0F89 spinnaker-eks-nodes-NodeGroup-KLBYTZDP0F89 i-054152fc4ed41d7b7 aws}

Now look at the AWS EC2 console (or at the terminal window running kubectl get nodes --watch) and after a moment you'll see one of the instances has been terminated.

bash Ready <none> 3d v1.10.3 Ready <none> 3d v1.10.3 NotReady <none> 3d v1.10.3 Ready <none> 3d v1.10.3 Ready <none> 3d v1.10.3

If you quickly open up the Spinnaker Deck web interface you'll see only two of the three instances in the cluster are active, as we see in kubectl above. However, wait a few more moments and Spinnaker will notice the loss of an instance, recognize it has been stopped/terminated due to an EC2 health check, and will immediately propagate a new instance to replace it, thus ensuring the server group's desired capacity remains at 3 instances.

For Kubernetes Spinnaker deployments, a kubectl get nodes --watch output confirms these changes (in this case, the new local instance was added).

bash Ready <none> 3d v1.10.3 Ready <none> 3d v1.10.3 NotReady <none> 10s v1.10.3 Ready <none> 3d v1.10.3 Ready <none> 3d v1.10.3 Ready <none> 20s v1.10.3

Spinnaker also tracks this information. Navigating to the your Spinnaker application INFRASTRUCTURE > CLUSTERS > spinnaker-eks-nodes-NodeGroup > CAPACITY and click View Scaling Activities to see the Spinnaker scaling activities log for this node group. In this case we see the successful activities that lead to the health check failure and new instance start.


How to Schedule Chaos Monkey Terminations

Before we get to scheduling anything you'll want to copy the chaosmonkey executable to the /apps/chaosmonkey directory. While you can leave it in the default $GOBIN directory, it'll be easier to use with cron jobs and other system commands if it's in a global location.

1sudo cp ~/go/bin/chaosmonkey /apps/chaosmonkey/

Now that we've confirmed we can manually terminate instances via Chaos Monkey you may want to setup an automated system for doing so. The primary way to do this is to create a series of scripts that regenerate a unique crontab job that is scheduled to execute on a specific date and time. This cron job is created every day (or however often you like), and the execution time is randomized based on the start_hour, end_hour, and time_zone settings in the chaosmonkey.toml configuration. We'll be using four files for this: Two crontab files and two bash scripts.

  • Start by creating the four files we'll be using for this.

    1sudo touch /apps/chaosmonkey/
    2sudo touch /apps/chaosmonkey/
    3sudo touch /etc/cron.d/chaosmonkey-schedule
    4sudo touch /etc/cron.d/chaosmonkey-daily-scheduler
  • Now set executable permissions for the two bash scripts so the cron (root) user can execute them.

    1sudo chmod a+rx /apps/chaosmonkey/
    2sudo chmod a+rx /apps/chaosmonkey/
  • Now we'll add some commands to each script in the order they're expected to call one another. First, the /etc/cron.d/chaosmonkey-daily-scheduler is executed once a day at a time you specify. This will call the /apps/chaosmonkey/ script, which will perform the actual scheduling for termination. Paste the following into /etc/cron.d/chaosmonkey-daily-scheduler (as with any cron job you can freely edit the schedule to determine when the cron job should be executed).

    1# min hour dom month day user command
    20 12 * * * root /apps/chaosmonkey/
  • The /apps/chaosmonkey/ script should perform the actual chaosmonkey schedule command, so paste the following into /apps/chaosmonkey/

    2/apps/chaosmonkey/chaosmonkey schedule >> /var/log/chaosmonkey-schedule.log 2>&1
  • When the chaosmonkey schedule command is called by the /apps/chaosmonkey/ script it will automatically write to the /etc/cron.d/chaosmonkey-schedule file with a randomized timestamp for execution based on the Chaos Monkey configuration. Here's an example of what the generated /etc/cron.d/chaosmonkey-schedule looks like.

    1# /etc/cron.d/chaosmonkey-schedule
    29 16 9 9 0 root /apps/chaosmonkey/ spinnaker aws-primary --cluster=spinnaker-eks-nodes-NodeGroup-KLBYTZDP0F89 --region=us-west-2
  • Lastly, the /apps/chaosmonkey/ script that is called by the generated /etc/cron.d/chaosmonkey-schedule cron job should issue the chaosmonkey terminate command and output the result to the log. Paste the following into /apps/chaosmonkey/

    2/apps/chaosmonkey/chaosmonkey terminate "$@" >> /var/log/chaosmonkey-terminate.log 2>&1

Next Steps

You're all set now! You should have a functional Spinnaker deployment with Chaos Monkey enabled, which will perform a cron job once a day to terminate random instances based on your configuration!

Chaos Monkey is just the tip of the Chaos Engineering iceberg, and there are a lot more failure modes you can experiment with to learn about your system.

The rest of this guide will cover the other tools in The Simian Army family, along with an in-depth look at the Chaos Monkey Alternatives. We built Gremlin to provide a production-ready framework to safely, securely, and easily simulate real outages with an ever-growing library of experiments.

© 2023 Gremlin Inc.All rights reserved.Privacy Policy
Download PDF