How to install Gremlin on ECS

Philip Gebhardt
Software Engineer
Last Updated:
August 28, 2023
Categories:
Chaos Engineering
,
Learn how to install and use Gremlin on Amazon Elastic Container Service (ECS).

This advanced installation guide will walk you through installing Gremlin docker containers in your ECS environment, and verifying that you can run a CPU attack against the freshly installed Gremlin agents. In the verification steps we will be creating a container to run htop exposed as a web interface via port 8888, which will allow us to visualize changes in real time as a simple CPU attack is run against the container.

Prerequisites

  • Functional ECS cluster, built in the region of your choice, utilizing EC2 backing instances. Fargate backed ECS is not currently supported.
  • Private Subnet's in the ECS VPC that route through a NAT-GW. Gremlin will be deployed in those Private Subnet's
  • Certificate based authentication should be used, and the certificates should be made available to the Gremlin daemon in /var/lib/gremlin.

Additionally, to get the most from this installation guide you should already be familiar with running Gremlin as a container. You can reference Install Gremlin in a Docker Container for help getting started with with Gremlin and Docker.

Step 1: Create the Task Definition

  1. Copy the provided JSON task definition into a text editor and supply vaulues for my-team-id, my-team-secret, my-aws-account. Additionally, review the Task Definition's limits and CPU architecture values to ensure they match your target environment.
  2. In the AWS management console navigate to Task Definitions the ECS service, and choose Create New Task Definition
  3. Select EC2 for the launch type compatibility and click Next Step
  4. Scroll down to the bottom of the page and click the button Configure via JSON
  5. Paste the edited task definition into the JSON text field and click the Save button

Step 2: Create the Daemon Service Definition

  1. In the AWS management console navigate to Clusters in the ECS service
  2. Select the cluster you want to deploy Gremlin into
  3. On the Services tab, click the Create button

On the Configure service page, set the parameters as follows:

  1. Select the Launch Type compute option.
  2. Select the EC2 launch type.
  3. Select the Service application tpye.
  4. Task Definition -> Family: gremlin
  5. Task Definition -> Revision: latest
  6. Service type: DAEMON
  7. The rest of the defaults are acceptable, click Create

Step 3: Verify the Installation

  1. In the AWS management console navigate to the Clusters in the ECS service
  2. Select the cluster you just deployed Gremlin into
  3. On the Services tab, you should now see the Gremlin service
  4. Verify that Desired tasks matches the number of ECS hosts in your cluster
  5. Verify that Running tasks matches the number of Desired tasks. Note that it can take several minutes for the ECS scheduler to launch Gremlin to full capacity
  6. Once the Gremlin service is running at full capacity, navigate to https://app.gremlin.com/clients/infrastructure
  7. You can search via the tag platform=ecs to verify that the Gremlin control plane can see the freshly launched ECS daemons
  8. Navigate to https://app.gremlin.com/attacks/new and click on the Containers tab
  9. Verify that you are seeing the application containers and tags currently running on your ECS cluster being imported into the Gremlin control plane

Step 4: Create a HTOP Elastic Container Repository with image

This will create a docker container that exposes htop via shellinaboxd on port 8888. htop is an interactive process viewer for Unix systems. We'll use htop as the target for an attack in step 8. Using htop isn't a requirement for installing Gremlin in Docker, but for this tutorial, using it makes it easier to see the impact of our attacks.

  1. In the AWS management console navigate to Repositories in the ECS service
  2. If you don't already have a repository, click Get started at the top; otherwise click New repository
  3. In Repository name type in htop, then click Next step
  4. Take note of the endpoint to push your docker image to, then click Done
  5. SSH into an instance in your AWS environment with the AWS command line tools and docker installed (e.g a jump box)
  6. Authenticate docker client against ECR: sudo $(aws ecr get-login --no-include-email --region us-east-1)
  7. Create and change directory to ~/docker-htop ; mkdir -p ~/docker-htop; cd ~/docker-htop
  8. Create the docker file:
    1. cat <<< 'FROM alpine:latest
      RUN apk --no-cache add --update htop && rm -rf /var/cache/apk/*
      RUN apk --no-cache add --repository http://dl-cdn.alpinelinux.org/alpine/edge/testing shellinabox
      ENTRYPOINT ["shellinaboxd", "-t", "-p8888", "-s/:nobody:nogroup:/:htop"]' > Dockerfile
  9. Create the docker image :sudo docker build -t htop .
  10. Tag the image. To push to the repository, you'll need the end point details from creating the repository: sudo docker tag htop:latest <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
  11. Push the container to ECR, again you'll need the end point details from creating the repository: sudo docker push <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest

Step 5: Create the HTOP Task Definition

  1. In the AWS management console navigate to Task Definitions the ECS service, and choose Create New Task Definition
  2. Select EC2 for the launch type compatibility and click Next Step
  3. On the Configure task and container definitions page, set the parameters as follows:
    1. Task Definition Name: htop
    2. Task Role: Leave blank
    3. Network Mode: Leave as <default>
    4. Task execution role: Leave as none
    5. Task memory (MiB): 128
    6. Task CPU (unit): 128
    7. Click Add container, and in the Add container modal enter the following information, leaving defaults unless otherwise specified:
      1. Container name: htop
      2. Image: <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
      3. Private repository authentication: Leave unchecked
      4. Memory Limits (MiB): Hard limit 128
      5. Port mappings: Host port: 8888; Container port: 8888
      6. Scroll down to the Docker Labels section and enter appropriate key-value tags, at a minimum we suggest app:htop
      7. Click the Add button
    8. Scroll down and click the Create button

Step 6: Create a service definition for HTOP

  1. In the AWS management console navigate to the Clusters in the ECS service
  2. Select the cluster you just to deployed Gremlin into
  3. On the Services tab, click the Create button
  4. On the Configure service page, set the parameters as follows:
    1. Launch type: EC2
    2. Task Definition -> Family: htop
    3. Task Definition -> Revision: latest
    4. Cluster -> The cluster you wish to deploy into
    5. Service name: htop
    6. Service type: REPLICA
    7. Number of tasks: 1
  5. Click Next step to bring you to the Set Auto Scaling page, and Next step again
  6. Review the service details to ensure accuracy, and if everything looks good click Create Service

Step 7: Open HTOP

  1. In the AWS management console navigate to the Clusters in the ECS service
  2. Select the cluster you just to deployed Gremlin into
  3. On the Services tab, click the htop service we just created
  4. Click on the Tasks tab
  5. Click on the task-ID for the running HTOP task
  6. Expand the htop container by clicking on the arrow next to the container name htop
  7. In the Network bindings section, click on the provided External Link
    1. If the external link does not work, you may need to go into the security group associated with your ECS cluster and open port 8888

Congratulations, you should now see the htop interface in your web browser. Leave this open, as we'll be referring to it in the next step.

Step 8: Run a test attack

  1. In a new browser window, open the link to the Gremlin Attacks UI: https://app.gremlin.com/attacks
  2. Click New Attack
  3. Select the Containers tab
  4. Select the HTOP container we created as your target by clicking the checkbox next to the container ID, you should be able to find this based on the Docker key-value pair we added, app:htop
  5. Click Choose a Gremlin
  6. Select the Resource category and CPU attack
  7. Default values should be fine, click Unleash Gremlin to launch the attack
  8. Observe in the open htop browser window that you can see the increased CPU load on the docker container.

Additional ECS Configurations

Now that you've ran a basic attack in ECS, there may be some advanced configuration that we want to make aware of:

  • networkMode - This option determines which network space we would like to affect. In our example, we have it set to awsvpc which means the task can only affect the awsvpc interface. Some other options are: host, bridge, or none. For more information, please consult the AWS guide on Network mode.
  • pidMode - This parameter allows you to configure the container to share their process ID with either the host or other containers in the task. By default, the setting is not stated. It may prove useful when performing process killer attacks to set this parameter to host. For more information, please consult the AWS guide on PID mode.

Conclusion

You now have Gremlin up and running in your ECS environment, and validated its functionality against a running htop container. For security, you should remove the htop container from your running cluster, as it's an unsecured metric view into your running environment.

Feel free to expand this to other ECS environments and have fun running Chaos Experiments!

Gremlin Task Definition JSON

The following task definition assumes Docker is the container driver.

If using containerd

  • Replace /run/docker/runtime-runc/moby with /run/containerd/runc/k8s.io
  • Replace /var/run/docker.sock with /run/containerd/containerd.sock

If using CRI-O

  • Replace /run/docker/runtime-runc/moby with /run/runc
  • Replace /var/run/docker.sock with /run/crio/crio.sock

JSON

{
   "family": "gremlin",
   "containerDefinitions": [
      {
         "name": "gremlin",
         "image": "gremlin/gremlin:latest",
         "cpu": 0,
         "portMappings": [],
         "essential": true,
         "entryPoint": [
            "/entrypoint.sh"
         ],
         "command": [
            "daemon"
         ],
         "environment": [
            {
               "name": "GREMLIN_TEAM_SECRET",
               "value": "my-team-secret"
            },
            {
               "name": "GREMLIN_TEAM_ID",
               "value": "my-team-id"
            },
            {
               "name": "GREMLIN_CLIENT_TAGS",
               "value": "platform=ecs"
            }
         ],
         "mountPoints": [
            {
               "sourceVolume": "runtime-runc",
               "containerPath": "/run/docker/runtime-runc/moby",
               "readOnly": false
            },
            {
               "sourceVolume": "runtime-socket",
               "containerPath": "/var/run/docker.sock",
               "readOnly": false
            },
            {
               "sourceVolume": "cgroup-root",
               "containerPath": "/sys/fs/cgroup",
               "readOnly": false
            },
            {
               "sourceVolume": "gremlin-state",
               "containerPath": "/var/lib/gremlin",
               "readOnly": false
            },
            {
               "sourceVolume": "gremlin-logs",
               "containerPath": "/var/log/gremlin",
               "readOnly": false
            }
         ],
         "volumesFrom": [],
         "linuxParameters": {
            "capabilities": {
               "add": [
                  "KILL",
                  "NET_ADMIN",
                  "SYS_BOOT",
                  "SYS_TIME",
                  "SYS_ADMIN",
                  "SYS_PTRACE"
               ]
            }
         },
         "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
               "awslogs-create-group": "true",
               "awslogs-group": "/ecs/gremlin",
               "awslogs-region": "us-west-1",
               "awslogs-stream-prefix": "ecs"
            }
         }
      }
   ],
   "executionRoleArn": "arn:aws:iam::my-aws-account:role/ecsTaskExecutionRole",
   "networkMode": "host",
   "volumes": [
      {
         "name": "runtime-runc",
         "host": {
            "sourcePath": "/run/docker/runtime-runc/moby"
         }
      },
      {
         "name": "runtime-socket",
         "host": {
            "sourcePath": "/var/run/docker.sock"
         }
      },
      {
         "name": "cgroup-root",
         "host": {
            "sourcePath": "/sys/fs/cgroup"
         }
      },
      {
         "name": "gremlin-state",
         "host": {
            "sourcePath": "/var/lib/gremlin"
         }
      },
      {
         "name": "gremlin-logs",
         "host": {
            "sourcePath": "/var/log/gremlin"
         }
      }
   ],
   "requiresCompatibilities": [
      "EC2"
   ],
   "cpu": "1024",
   "memory": "1024",
   "pidMode": "host",
   "runtimePlatform": {
      "cpuArchitecture": "X86_64",
      "operatingSystemFamily": "LINUX"
   },
   "tags": [
      {
         "key": "ecs:taskDefinition:createdFrom",
         "value": "ecs-console-v2"
      }
   ]
}
No items found.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
start your trial

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.GET STARTED

Product Hero ImageShape