Chaos Engineering with DocumentDB

Tammy Butow
Principal SRE
Last Updated:
February 6, 2019
Chaos Engineering

Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. Amazon DocumentDB is a MongoDB-compatible database. Datadog is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

This tutorial shows:

  • How to create an AWS VPC for your DocumentDB cluster
  • How to create a DocumentDB Cluster in your VPC
  • How to create a Bastion Host in the same VPC
  • How to SSH into your Bastion Host and install the MongoDB Shell
  • How to Install Gremlin on your Bastion Host to practice Chaos Engineering
  • How to connect to your DocumentDB cluster
  • Chaos Engineering: Run a blackhole attack using Gremlin

Chaos Engineering Hypothesis

For the purposes of this tutorial we will run Chaos Engineering experiments on the DocumentDB cluster and individual instances. We will focus on network related Chaos Engineering attacks.


Step 1 - Create a VPC for Amazon DocumentDB

In this step, you’ll setup a VPC for your Amazon DocumentDB cluster.

Navigate to the AWS VPC console for the EU West 1 region.

Make sure you have a default VPC for cluster in EU-West-1, if not click to create a default VPC:The default VPC we will be using for this tutorial is: vpc-2e9e4349. Use the default security group for your default VPC.

default vpc

The default VPC for the region will automatically have 3 subnets in different regions for your use.


You will need to ensure the security group you are using for your VPC allows access to your MongoDB cluster on port 27017 (default). To do this visit Security Groups in the VPC Dashboard.

You will need to ensure you have the following inbound rules as shown below:

security groups

You will also need to ensure you have the following outbound rules as shown below:

documentdb security outbound

Step 2 - Create an Amazon DocumentDB Cluster in your VPC

Now to create a DocDB cluster. First click on the create button:

documentdb create cluster

When creating your VPC you will need to use specific settings to ensure you can appropriately use your Amazon DocumentDB cluster.

  • VPC - This is the default VPC in EU-west-1 from step 1
  • User - You will need to create a username for your cluster
  • Password - You will need to create a password for your cluster
  • Security group - Use the default security group for EU-west-1 as mentioned earlier. You will need to update the rules for your security group. Ensure you can access port 27017 which is the default port for DocumentDB.

When you create the AWS DocumentDB cluster it will automatically create three instances for you.

documentdb cluster

Step 3 - Create a Bastion Host in the same VPC

In this step you will create a bastion host in the same availability zone as your DocDB writer. The first instance in the list will be your DocDB writer. You can identify what region it is in by clicking on the instance and finding the availability zone information. For example, our example DocDB writer is in eu-west-1b.

Navigate to the EC2 console and click to create a new instance. Use an Ubuntu t2.micro. As your EC2 instance for your bastion host. You will need to make sure it is in the same vpc, e.g. vpc-2e9e4349 and has the same security group, e.g. sg-b20113cb (default)

You will need to update the rules for your default security group to enable it to work with your Amazon DocumentDB cluster.

documentdb instance

When you create your instance use an existing keys or generate a new key.

documentdb key pair

Step 4 - SSH into your Bastion Host and install the MongoDB Shell

In this step you will SSH into the bastion host and install the MongoDB shell.

Use the following commands to SSH into your bastion host replacing the instance and key with your own:


ssh -i "chaoseu.pem"

Next you will need to get your bastion host ready to connect to mongodb:


sudo apt-get update sudo apt-key adv --keyserver hkp:// --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5echo "deb [ arch=amd64,arm64 ] xenial/mongodb-org/3.6 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.listsudo apt-get updatesudo apt-get install -y mongodb-org-shell

Lastly you will need to get the rds combined ca bundle, you will need this to be able to connect to MongoDB:



Step 5 - Install Gremlin on your Bastion Host to practice Chaos Engineering

In this step we will install Gremlin on our bastion host so we can run Chaos Engineering attacks. This will enable us to trigger attacks on our Amazon DocumentDB cluster.

First, ssh into your server and add the Gremlin Debian repository:


echo "deb release non-free" | sudo tee /etc/apt/sources.list.d/gremlin.list

Import the repo’s GPG key:


sudo apt-key adv --keyserver --recv-keys C81FC2F43A48B25808F9583BDFF170F324D41134 9CDB294B29A5B1E2E00C24C022E8EF3461A50EF6

Then install the Gremlin daemon and CLI:


sudo apt-get update && sudo apt-get install -y gremlind gremlin

The Gremlin daemon (gremlind) connects to the Gremlin backend and waits for attack orders from you. When it receives attack orders, it uses the CLI (gremlin) to run the attack.

Run gremlin init to configure the Gremlin daemon:


gremlin init

You will be prompted to enter your Gremlin Team ID and Secret which you can find in the Gremlin UI under Team Settings.

Step 6 - Connect to your DocumentDB cluster

In this step you will connect to your Amazon DocumentDB cluster using your bastion host. You can find the command you will need to run in the DocumentDB console.


mongo --ssl --host --sslCAFile rds-combined-ca-bundle.pem --username tammy --password 

If successful, you will see the following result:


MongoDB shell version v3.6.10connecting to: mongodb:// session: session { "id" : UUID("ee936a56-530d-4551-92e9-b629c7e7ad2b") }MongoDB server version: 3.6.0Welcome to the MongoDB shell.For interactive help, type "help".For more comprehensive documentation, see Try the support group>

Next type help at the MongoDB shell prompt and it will return the following:

BASH                    help on db methods             help on collection methods                    sharding helpers                    replica set helpers    help admin                   administrative help    help connect                 connecting to a db help    help keys                    key shortcuts  help misc                    misc things to know    help mr                      mapreduce  show dbs                     show database names    show collections             show collections in current database   show users                   show users in current database show profile                 show most recent system.profile entries with time >= 1ms   show logs                    show the accessible logger names   show log [name]              prints out the last segment of log in memory, 'global' is default  use                 set current database                list objects in collection foo { a : 1 } )     list objects in foo where a == 1   it                           result of the last line evaluated; use to further iterate  DBQuery.shellBatchSize = x   set default number of items to display on shell    exit                         quit the mongo shell

Now we are going to load in some sample data, at the prompt type the following:


db.inventory.insertMany([...    // MongoDB adds the _id field with an ObjectId if _id is not present...    { item: "journal", qty: 25, status: "A",...        size: { h: 14, w: 21, uom: "cm" }, tags: [ "blank", "red" ] },...    { item: "notebook", qty: 50, status: "A",...        size: { h: 8.5, w: 11, uom: "in" }, tags: [ "red", "blank" ] },...    { item: "paper", qty: 100, status: "D",...        size: { h: 8.5, w: 11, uom: "in" }, tags: [ "red", "blank", "plain" ] },...    { item: "planner", qty: 75, status: "D",...        size: { h: 22.85, w: 30, uom: "cm" }, tags: [ "blank", "red" ] },...    { item: "postcard", qty: 45, status: "A",...        size: { h: 10, w: 15.25, uom: "cm" }, tags: [ "blue" ] }... ]);

You will see the following result if successful:


{   "acknowledged" : true,  "insertedIds" : [       ObjectId("5c46564ee867ed2238962d54"),       ObjectId("5c46564ee867ed2238962d55"),       ObjectId("5c46564ee867ed2238962d56"),       ObjectId("5c46564ee867ed2238962d57"),       ObjectId("5c46564ee867ed2238962d58")    ]}

To retrieve the data you inserted run the following command at the mongodb shell prompt:


rs0:PRIMARY> db.inventory.find( {} )

You will see the following result if successful:


{ "_id" : ObjectId("5c46564ee867ed2238962d54"), "item" : "journal", "qty" : 25, "status" : "A", "size" : { "h" : 14, "w" : 21, "uom" : "cm" }, "tags" : [ "blank", "red" ] }{ "_id" : ObjectId("5c46564ee867ed2238962d55"), "item" : "notebook", "qty" : 50, "status" : "A", "size" : { "h" : 8.5, "w" : 11, "uom" : "in" }, "tags" : [ "red", "blank" ] }{ "_id" : ObjectId("5c46564ee867ed2238962d56"), "item" : "paper", "qty" : 100, "status" : "D", "size" : { "h" : 8.5, "w" : 11, "uom" : "in" }, "tags" : [ "red", "blank", "plain" ] }{ "_id" : ObjectId("5c46564ee867ed2238962d57"), "item" : "planner", "qty" : 75, "status" : "D", "size" : { "h" : 22.85, "w" : 30, "uom" : "cm" }, "tags" : [ "blank", "red" ] }{ "_id" : ObjectId("5c46564ee867ed2238962d58"), "item" : "postcard", "qty" : 45, "status" : "A", "size" : { "h" : 10, "w" : 15.25, "uom" : "cm" }, "tags" : [ "blue" ] }

You can browse more MongoDB tutorials here:

Step 7 - Now You Are Ready to Practice Chaos Engineering

It is possible to run many Chaos Engineering experiments to learn more about the reliability and durability of Amazon DocumentDB. First we must decide where to start.

DocDB has many promises including:

  • “On instance failure, Amazon DocumentDB automates failover to one of up to 15 Amazon DocumentDB replicas that you create in other Availability Zones. If no replicas have been provisioned and a failure occurs, Amazon DocumentDB tries to create a new Amazon DocumentDB instance automatically.”
  • “You can add replicas in minutes regardless of the storage volume size.”
  • “The backup capability in Amazon DocumentDB enables point-in-time recovery for your cluster. This feature allows you to restore your cluster to any second during your retention period, up to the last 5 minutes.”
  • “Process millions of user requests per second with millisecond latency.”

We can use Gremlin to practice Chaos Engineering. Gremlin will enable us to schedule Chaos Engineering attacks. It also has built in automated integrations for Slack and Datadog.

Step 8 - Chaos Engineering: Run a blackhole attack using Gremlin

First let’s start by setting up Gremlin to do Chaos Engineering for our DocDB cluster.

To perform our first Network Chaos Engineering attack we will inject failure while attempting to return results from the primary MongoDB instance.

Ensure you are connected to your MongoDB instance:

Identify the instance endpoint from the Amazon DocumentDB instance console, for example:


Now you can run the Gremlin Blackhole Attack using the Gremlin UI. Navigate to New Attack and enter the endpoint as the hostname:

gremlin documentdb

While running the Gremlin Blackhole attack, attempt to retrieve the data you inserted run the following command at the mongodb shell prompt:


rs0:PRIMARY> db.inventory.find( {} )

You will notice that it will no longer return the results from DocumentDB.

By practicing Chaos Engineering in this way we can answer many questions, e.g:

  • Are we monitoring for networking incidents?
  • Can we accurately determine that this one Amazon DocumentDB instance is experiencing networking issues?
  • Can we determine that the networking incident is a blackhole?


This tutorial has explored how to perform Chaos Engineering experiments on Amazon DocumentDB using Gremlin. We discovered some things about how we can use Gremlin to practice network Chaos Engineering and identified important questions to ask in regards to network monitoring and incident management.

No items found.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
start your trial

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.GET STARTED

Product Hero ImageShape