How to install Gremlin on ECS

How to install and use Gremlin with ECS

This advanced installation guide will walk you through installing Gremlin docker containers in your ECS environment, and verifying that you can run a CPU attack against the freshly installed Gremlin agents. In the verification steps we will be creating a container to run htop exposed as a web interface via port 8888, which will allow us to visualize changes in real time as a simple CPU attack is run against the container.

Prerequisites

  • Functional ECS cluster, built in the region of your choice, utilizing EC2 backing instances. Fargate backed ECS is not currently supported.
  • Private Subnet's in the ECS VPC that route through a NAT-GW. Gremlin will be deployed in those Private Subnet's
  • Certificate based authentication should be used, and the certificates should be made available to the Gremlin daemon in /var/lib/gremlin.

Additionally, to get the most from this installation guide you should already be familiar with running Gremlin as a container. You can reference Install Gremlin in a Docker Container for help getting started with with Gremlin and Docker.

Step 1: Create the Task Definition

  1. In the AWS management console navigate to Task Definitions the ECS service, and choose Create New Task Definition
  2. Select EC2 for the launch type compatibility and click Next Step
  3. Scroll down to the bottom of the page and click the button Configure via JSON
  4. Copy the provided JSON task definition into the JSON text field and click the Save button

Step 2: Create the Daemon Service Definition

  1. In the AWS management console navigate to Clusters in the ECS service
  2. Select the cluster you want to deploy Gremlin into
  3. On the Services tab, click the Create button
  4. On the Configure service page, set the parameters as follows:

    1. Launch type: EC2
    2. Task Definition -> Family: Gremlin
    3. Task Definition -> Revision: latest
    4. Cluster -> The cluster you wish to deploy into
    5. Service name: Gremlin
    6. Service type: DAEMON
  5. The rest of the defaults are acceptable, click the Next step button
  6. On the Configure network page, set the parameters as follows:

    1. Cluster VPC: The appropriate VPC for your ECS cluster
    2. Subnets: Appropriate subnets that route through a NAT-GW per the prerequisites
    3. Security groups: Appropriate security per your policies that allow egress SSL traffic to api.gremlin.com
    4. Auto-assign public IP: DISABLED
    5. Load balancer type: None
    6. Enable service discovery integration: Unchecked
  7. Click Next step to bring you to the Set Auto Scaling page, and Next step again
  8. Review the service details to ensure accuracy, and if everything looks good click Create Service

Set 3: Verify Installation

  1. In the AWS management console navigate to the Clusters in the ECS service
  2. Select the cluster you just deployed Gremlin into
  3. On the Services tab, you should now see the Gremlin service
  4. Verify that Desired tasks matches the number of ECS hosts in your cluster
  5. Verify that Running tasks matches the number of Desired tasks. Note that it can take several minutes for the ECS scheduler to launch Gremlin to full capacity
  6. Once the Gremlin service is running at full capacity, navigate to https://app.gremlin.com/clients/infrastructure
  7. You can search via the tag orchestration:ecs to verify that the Gremlin control plane can see the freshly launched ECS daemons
  8. Navigate to https://app.gremlin.com/attacks/new and click on the Containers tab
  9. Verify that you are seeing the application containers and tags currently running on your ECS cluster being imported into the Gremlin control plane

Step 4: Create a HTOP Elastic Container Repository with image

This will create a docker container that exposes htop via shellinaboxd on port 8888

  1. In the AWS management console navigate to Repositories in the ECS service
  2. If you don't already have a repository, click Get started at the top; otherwise click New repository
  3. In Repository name type in htop, then click Next step
  4. Take note of the endpoint to push your docker image to, then click Done
  5. SSH into an instance in your AWS environment with the AWS command line tools and docker installed (e.g a jump box)
  6. Authenticate docker client against ECR: sudo $(aws ecr get-login --no-include-email --region us-east-1)
  7. Create and change directory to ~/docker-htop ; mkdir -p ~/docker-htop; cd ~/docker-htop
  8. Create the docker file

    cat <<< 'FROM alpine:latest
    RUN apk --no-cache add --update htop && rm -rf /var/cache/apk/*
    RUN apk --no-cache add --repository http://dl-cdn.alpinelinux.org/alpine/edge/testing shellinabox
    ENTRYPOINT ["shellinaboxd", "-t", "-p8888", "-s/:nobody:nogroup:/:htop"]' > Dockerfile
  9. Create the docker image sudo docker build -t htop .
  10. Tag the image to push to the repository, you'll need the end point details from creating the repository sudo docker tag htop:latest <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
  11. Push the container to ECR, again you'll need the end point details from creating the repository sudo docker push <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest

Step 6: Create the HTOP Task Definition

  1. In the AWS management console navigate to Task Definitions the ECS service, and choose Create New Task Definition
  2. Select EC2 for the launch type compatibility and click Next Step
  3. On the Configure task and container definitions page, set the parameters as follows:

    1. Task Definition Name: htop
    2. Task Role: Leave blank
    3. Network Mode: Leave as <default>
    4. Task execution role: Leave as none
    5. Task memory (MiB): 128
    6. Task CPU (unit): 128
    7. Click Add container, and in the Add container modal enter the following information, leaving defaults unless otherwise specified:
    8. Container name: htop
    9. Image: <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
    10. Private repository authentication: Leave unchecked
    11. Memory Limits (MiB): Hard limit 128
    12. Port mappings: Host port: 8888; Container port: 8888
    13. Scroll down to the Docker Labels section and enter appropriate key-value tags, at a minimum we suggest app:htop
    14. Click the Add button
    15. Scroll down and click the Create button

Step 7: Create a service definition for HTOP

  1. In the AWS management console navigate to the Clusters in the ECS service
  2. Select the cluster you just to deployed Gremlin into
  3. On the Services tab, click the Create button
  4. On the Configure service page, set the parameters as follows:

    1. Launch type: EC2
    2. Task Definition -> Family: htop
    3. Task Definition -> Revision: latest
    4. Cluster -> The cluster you wish to deploy into
    5. Service name: htop
    6. Service type: REPLICA
    7. Number of tasks: 1
  5. Click Next step to bring you to the Set Auto Scaling page, and Next step again
  6. Review the service details to ensure accuracy, and if everything looks good click Create Service

Step 8: Open HTOP

  1. In the AWS management console navigate to the Clusters in the ECS service
  2. Select the cluster you just to deployed Gremlin into
  3. On the Services tab, click the htop service we just created
  4. Click on the Tasks tab
  5. Click on the task-ID for the running HTOP task
  6. Expand the htop container by clicking on the arrow next to the container name htop
  7. In the Network bindings section, click on the provided External Link

    1. If the external link does not work, you may need to go into the security group associated with your ECS cluster and open port 8888

Congratulations, you should now see the htop interface in your web browser. Leave this open, as we'll be referring back to it in the next step.

Step 9: Run a test attack

  1. In a new browser window, open the link to the Gremlin Attacks UI: https://app.gremlin.com/attacks
  2. Click New Attack
  3. Select the Containers tab
  4. Select the HTOP container we created as your target by clicking the checkbox next to the container ID, you should be able to find this based on the Docker key-value pair we added, app:htop
  5. Click Choose a Gremlin
  6. Select the Resource category and CPU attack
  7. Default values should be fine, click Unleash Gremlin to launch the attack
  8. Observe in the open htop browser window that you can see the increased CPU load on the docker container.

Conclusion

You now have Gremlin up and running in your ECS environment, and validated its functionality against a running htop container. For security, you should remove the htop container from your running cluster, as it's an unsecured metric view into your running environment.

Feel free to expand this to other ECS environments and have fun running Chaos Experiments!

Gremlin Task Definition JSON

{
  "family": "Gremlin",
  "taskRoleArn": null,
  "executionRoleArn": null,
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/Gremlin",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "entryPoint": [
        "/entrypoint.sh"
      ],
      "portMappings": [],
      "command": [
        "daemon"
      ],
      "linuxParameters": {
        "capabilities": {
          "add": [
            "NET_ADMIN",
            "SYS_BOOT",
            "SYS_TIME",
            "KILL"
          ],
          "drop": null
        },
        "sharedMemorySize": null,
        "tmpfs": null,
        "devices": null,
        "initProcessEnabled": null
      },
      "cpu": 128,
      "environment": [
        {
          "name": "GREMLIN_CLIENT_TAGS",
          "value": "orchestration=ecs,owner=<Your_Name>"
        },
        {
          "name": "GREMLIN_TEAM_CERTIFICATE_OR_FILE",
          "value": "file:///var/lib/gremlin/team_pub.pem"
        },
        {
          "name": "GREMLIN_TEAM_PRIVATE_KEY_OR_FILE",
          "value": "file:///var/lib/gremlin/team_priv.pem"
        }
      ],
      "ulimits": null,
      "dnsServers": [],
      "mountPoints": [
        {
          "readOnly": null,
          "containerPath": "/var/lib/gremlin",
          "sourceVolume": "var-lib-gremlin"
        },
        {
          "readOnly": null,
          "containerPath": "/var/log/gremlin",
          "sourceVolume": "var-log-gremlin"
        },
        {
          "readOnly": true,
          "containerPath": "/var/run/docker.sock",
          "sourceVolume": "var-run-docker-sock"
        }
      ],
      "workingDirectory": null,
      "dockerSecurityOptions": [],
      "memory": null,
      "memoryReservation": 128,
      "volumesFrom": [],
      "image": "gremlin/gremlin",
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "systemControls": null,
      "privileged": false,
      "name": "Gremlin"
    }
  ],
  "volumes": [
    {
      "name": "var-lib-gremlin",
      "host": {
        "sourcePath": "/var/lib/gremlin"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "name": "var-log-gremlin",
      "host": {
        "sourcePath": "/var/log/gremlin"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "name": "var-run-docker-sock",
      "host": {
        "sourcePath": "/var/run/docker.sock"
      },
      "dockerVolumeConfiguration": null
    }
  ],
  "requiresCompatibilities": [
    "EC2"
  ],
  "cpu": "128",
  "memory": "128"
}

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Try Gremlin for free and see how you can harness chaos to build resilient systems.