Chaos Engineering with DocumentDB
Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. Amazon DocumentDB is a MongoDB-compatible database. Datadog is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.
This tutorial shows:
- How to create an AWS VPC for your DocumentDB cluster
- How to create a DocumentDB Cluster in your VPC
- How to create a Bastion Host in the same VPC
- How to SSH into your Bastion Host and install the MongoDB Shell
- How to Install Gremlin on your Bastion Host to practice Chaos Engineering
- How to connect to your DocumentDB cluster
- Chaos Engineering: Run a blackhole attack using Gremlin
Chaos Engineering Hypothesis
For the purposes of this tutorial we will run Chaos Engineering experiments on the DocumentDB cluster and individual instances. We will focus on network related Chaos Engineering attacks.
Step 1 - Create a VPC for Amazon DocumentDB
In this step, you’ll setup a VPC for your Amazon DocumentDB cluster.
Navigate to the AWS VPC console for the EU West 1 region.
Make sure you have a default VPC for cluster in EU-West-1, if not click to create a default VPC:The default VPC we will be using for this tutorial is: vpc-2e9e4349. Use the default security group for your default VPC.
The default VPC for the region will automatically have 3 subnets in different regions for your use.
You will need to ensure the security group you are using for your VPC allows access to your MongoDB cluster on port 27017 (default). To do this visit Security Groups in the VPC Dashboard.
You will need to ensure you have the following inbound rules as shown below:
You will also need to ensure you have the following outbound rules as shown below:
Step 2 - Create an Amazon DocumentDB Cluster in your VPC
Now to create a DocDB cluster. First click on the create button:
When creating your VPC you will need to use specific settings to ensure you can appropriately use your Amazon DocumentDB cluster.
- VPC - This is the default VPC in EU-west-1 from step 1
- User - You will need to create a username for your cluster
- Password - You will need to create a password for your cluster
- Security group - Use the default security group for EU-west-1 as mentioned earlier. You will need to update the rules for your security group. Ensure you can access port 27017 which is the default port for DocumentDB.
When you create the AWS DocumentDB cluster it will automatically create three instances for you.
Step 3 - Create a Bastion Host in the same VPC
In this step you will create a bastion host in the same availability zone as your DocDB writer. The first instance in the list will be your DocDB writer. You can identify what region it is in by clicking on the instance and finding the availability zone information. For example, our example DocDB writer is in eu-west-1b.
Navigate to the EC2 console and click to create a new instance. Use an Ubuntu t2.micro. As your EC2 instance for your bastion host. You will need to make sure it is in the same vpc, e.g. vpc-2e9e4349 and has the same security group, e.g. sg-b20113cb (default)
You will need to update the rules for your default security group to enable it to work with your Amazon DocumentDB cluster.
When you create your instance use an existing keys or generate a new key.
Step 4 - SSH into your Bastion Host and install the MongoDB Shell
In this step you will SSH into the bastion host and install the MongoDB shell.
Use the following commands to SSH into your bastion host replacing the instance and key with your own:
Next you will need to get your bastion host ready to connect to mongodb:
Lastly you will need to get the rds combined ca bundle, you will need this to be able to connect to MongoDB:
Step 5 - Install Gremlin on your Bastion Host to practice Chaos Engineering
In this step we will install Gremlin on our bastion host so we can run Chaos Engineering attacks. This will enable us to trigger attacks on our Amazon DocumentDB cluster.
First, ssh into your server and add the Gremlin Debian repository:
Import the repo’s GPG key:
Then install the Gremlin daemon and CLI:
The Gremlin daemon (gremlind) connects to the Gremlin backend and waits for attack orders from you. When it receives attack orders, it uses the CLI (gremlin) to run the attack.
Run gremlin init to configure the Gremlin daemon:
You will be prompted to enter your Gremlin Team ID and Secret which you can find in the Gremlin UI under Team Settings.
Step 6 - Connect to your DocumentDB cluster
In this step you will connect to your Amazon DocumentDB cluster using your bastion host. You can find the command you will need to run in the DocumentDB console.
If successful, you will see the following result:
Next type help at the MongoDB shell prompt and it will return the following:
Now we are going to load in some sample data, at the prompt type the following:
You will see the following result if successful:
To retrieve the data you inserted run the following command at the mongodb shell prompt:
You will see the following result if successful:
You can browse more MongoDB tutorials here: https://docs.mongodb.com/manual/tutorial/getting-started/
Step 7 - Now You Are Ready to Practice Chaos Engineering
It is possible to run many Chaos Engineering experiments to learn more about the reliability and durability of Amazon DocumentDB. First we must decide where to start.
DocDB has many promises including:
- “On instance failure, Amazon DocumentDB automates failover to one of up to 15 Amazon DocumentDB replicas that you create in other Availability Zones. If no replicas have been provisioned and a failure occurs, Amazon DocumentDB tries to create a new Amazon DocumentDB instance automatically.”
- “You can add replicas in minutes regardless of the storage volume size.”
- “The backup capability in Amazon DocumentDB enables point-in-time recovery for your cluster. This feature allows you to restore your cluster to any second during your retention period, up to the last 5 minutes.”
- “Process millions of user requests per second with millisecond latency.”
We can use Gremlin to practice Chaos Engineering. Gremlin will enable us to schedule Chaos Engineering attacks. It also has built in automated integrations for Slack and Datadog.
Step 8 - Chaos Engineering: Run a blackhole attack using Gremlin
First let’s start by setting up Gremlin to do Chaos Engineering for our DocDB cluster.
To perform our first Network Chaos Engineering attack we will inject failure while attempting to return results from the primary MongoDB instance.
Ensure you are connected to your MongoDB instance:
Identify the instance endpoint from the Amazon DocumentDB instance console, for example:
Now you can run the Gremlin Blackhole Attack using the Gremlin UI. Navigate to New Attack and enter the endpoint as the hostname:
While running the Gremlin Blackhole attack, attempt to retrieve the data you inserted run the following command at the mongodb shell prompt:
You will notice that it will no longer return the results from DocumentDB.
By practicing Chaos Engineering in this way we can answer many questions, e.g:
- Are we monitoring for networking incidents?
- Can we accurately determine that this one Amazon DocumentDB instance is experiencing networking issues?
- Can we determine that the networking incident is a blackhole?
This tutorial has explored how to perform Chaos Engineering experiments on Amazon DocumentDB using Gremlin. We discovered some things about how we can use Gremlin to practice network Chaos Engineering and identified important questions to ask in regards to network monitoring and incident management.