How to Set Up Chaos Engineering in your Continuous Delivery pipeline with Gremlin and Jenkins

How to Set Up Chaos Engineering in your Continuous Delivery pipeline with Gremlin and Jenkins

To maximize the benefits of Chaos Engineering, it’s important to run experiments in your dev/test environments, with each build, and in production. We encourage people to start in testing environments, and then thoughtfully mature to execute small experiments in production environments. A crucial step of this progression is testing each build.

Many operations teams today leverage Continuous Deployment (CD) pipelines to provide a repeatable automated sequence of steps. This enables a consistent ability to stand up an environment, perform validations, and optionally tear down the environment to revert to a clean slate in a repeatable way. Those validations should include chaos experiments to ensure all code pushes are reliable before they reach customers.

In this tutorial, we create stages in a Jenkins pipeline to inject a controlled amount of failure with Gremlin. We then add a final stage that allows you to optionally halt the attack from the pipeline, rather than having to wait the full duration of the attack.

Prerequisites

Before you begin this tutorial, you’ll need the following:

  • Docker, for easily deploying Jenkins from a container image
  • Gremlin deployed on the host where Jenkins will run

Create your Gremlin Free account

Sign up now. Free forever.
First name
Last name
Email
Log in

Step 1 - Get Jenkins Up and Running

In this step, you’ll stand up an instance of Jenkins using the official Docker image. If you already have a Jenkins environment, skip to Step 3 - Create your Chaos Deployment Pipeline.

At the command line, enter the following to initialize a Jenkins instance using Docker.

bash
1docker run -p 8080:8080 -p 50000:50000 jenkins/jenkins:lts-alpine

Navigate to http://localhost:8080 on your browser to confirm Jenkins is working. If this is your first time setting up Jenkins, you will need to enter your admin password and your choice of packages. For this tutorial, the defaults will work fine. Then add an admin user.

Step 2 - Retrieve and Add a Gremlin API Key to Jenkins

In this step, you’ll enter your Gremlin API Key into the Jenkins instance. Gather your Gremlin API Key. Head to Team > API Key and create a “New API Key”. Save that key for the next part.

Open the following in your browser:

http://localhost:8080/credentials/store/system/domain/_/newCredentials

Or navigate to Jenkins > Credentials > System > Global credentials (unrestricted). Set the Kind to Secret text and Scope to Global as shown below. Then, enter a Gremlin API Key in the Secret field. Enter gremlin-api-key as the ID. Click OK to save.

Step 3 - Create your Chaos Deployment Pipeline

In this step, you’ll create a Jenkins pipeline that uses the Gremlin API to run chaos experiments in a test environment that you spin up for testing purposes.

On the Jenkins home screen, go to “New Item”. Give it a name, like “Chaos Pipeline”. Select Pipeline and click Ok. Copy and paste the following pipeline code, modifying it to use your Gremlin tagged host instead of my host at 128.199.2.166.

bash
1pipeline {
2 agent none
3 environment {
4 ATTACK_ID = ''
5 GREMLIN_API_KEY = credentials('gremlin-api-key')
6 }
7 parameters {
8 string(name: 'CPU_LENGTH', defaultValue: '30', description: 'Duration of CPU attack')
9 string(name: 'CPU_CORE', defaultValue: '1', description: 'Number of cores to impact')
10 string(name: 'TARGET_IDENTIFIER', defaultValue: '128.199.2.166', description: 'Host to target')
11 }
12 stages {
13 stage('Initiate Environment') {
14 steps{
15 echo "first spin up your chaos environment"
16 }
17 }
18 stage('Install App into Environment') {
19 steps{
20 echo "then install your app into said environment"
21 }
22 }
23 stage('Failure Injection') {
24 agent any
25 steps {
26 script {
27 ATTACK_ID = sh (
28 script: "curl -s -H 'Content-Type: application/json' -H 'Authorization: Key ${GREMLIN_API_KEY}' https://api.gremlin.com/v1/attacks/new --data '{ \"command\": { \"type\": \"cpu\", \"args\": [\"-c\", \"$CPU_CORE\", \"-l\", \"$CPU_LENGTH\"] },\"target\": { \"type\": \"Exact\", \"hosts\" : { \"ids\": [\"$TARGET_IDENTIFIER\"] } } }' --compressed",
29 returnStdout: true
30 ).trim()
31 echo "see your attack at https://app.gremlin.com/attacks/${ATTACK_ID}"
32 }
33 }
34 }
35 stage('Observe and Halt') {
36 agent any
37 input {
38 message 'Do you want to halt attack?'
39 parameters {
40 choice(choices: ['yes' , 'no'], name: 'HALT', description: '')
41 }
42 }
43 steps {
44 script {
45 if (env.HALT=='yes') {
46 sh "curl -s -X DELETE https://api.gremlin.com/v1/attacks/${ATTACK_ID} -H 'Authorization: Key ${GREMLIN_API_KEY}' --compressed"
47 }
48 }
49 }
50 }
51 }
52}

This example includes a CPU attack with defaults of 30 seconds, 1 CPU core and Exact targeting. You can replace this attack with any attack or scenario of your choosing using the Gremlin API.

Next, run the demo script by selecting “Build with Parameters”, then “Build”. Jenkins will fly through the first 2 stages and then kick off the CPU attack.

The console output will look like this:

1Started by user hml
2Running in Durability level: MAX_SURVIVABILITY
3[Pipeline] Start of Pipeline
4[Pipeline] withCredentials
5Masking supported pattern matches of $GREMLIN_API_KEY
6[Pipeline] {
7[Pipeline] withEnv
8[Pipeline] {
9[Pipeline] stage
10[Pipeline] { (Initiate Environment)
11[Pipeline] echo
12first spin up your chaos environment
13[Pipeline] }
14[Pipeline] // stage
15[Pipeline] stage
16[Pipeline] { (Install App into Environment)
17[Pipeline] echo
18then install your app into said environment
19[Pipeline] }
20[Pipeline] // stage
21[Pipeline] stage
22[Pipeline] { (Failure Injection)
23[Pipeline] node
24Running on Jenkins in /var/jenkins_home/workspace/cpu-attack
25[Pipeline] {
26[Pipeline] script
27[Pipeline] {
28[Pipeline] sh
29+ curl -s -H 'Content-Type: application/json' -H 'X-Gremlin-Agent: jenkins' -H 'Authorization: Key ****' https://api.gremlin.com/v1/attacks/new --data '{ "command": { "type": "cpu", "args": ["-c", "2", "-l", "600"] },"target": { "type": "Exact", "hosts" : { "ids": ["i-068d40f628bb969cc"] } } }' --compressed
30[Pipeline] echo
31see your attack at https://app.gremlin.com/attacks/29b48b49-8d87-11e9-9e94-02420f6e9654
32[Pipeline] }
33[Pipeline] // script
34[Pipeline] }
35[Pipeline] // node
36[Pipeline] }
37[Pipeline] // stage
38[Pipeline] stage
39[Pipeline] { (Observe and Halt)
40[Pipeline] input
41Input requested
42Approved by hml
43[Pipeline] withEnv
44[Pipeline] {
45[Pipeline] node
46Running on Jenkins in /var/jenkins_home/workspace/cpu-attack
47[Pipeline] {
48[Pipeline] script
49[Pipeline] {
50[Pipeline] sh
51+ curl -s -X DELETE https://api.gremlin.com/v1/attacks/29b48b49-8d87-11e9-9e94-02420f6e9654 -H 'X-Gremlin-Agent: jenkins' -H 'Authorization: Key ****' --compressed
52[Pipeline] }
53[Pipeline] // script
54[Pipeline] }
55[Pipeline] // node
56[Pipeline] }
57[Pipeline] // withEnv
58[Pipeline] }
59[Pipeline] // stage
60[Pipeline] }
61[Pipeline] // withEnv
62[Pipeline] }
63[Pipeline] // withCredentials
64[Pipeline] End of Pipeline
65Finished: SUCCESS

Conclusion

This tutorial is just a beginning to help you get started thinking about how to effectively use Chaos Engineering tests in your CI/CD build pipeline. You can add multiple scenarios with automatic halting using Status Checks, or run chaos experiments while running integration tests to ensure your system works as intended in degraded environments. We used Jenkins in our example, but you can do the same thing using Spinnaker or any other tool that allows you to design and run similar tests in your pipeline.

Related

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Get started