
This advanced installation guide will walk you through installing Gremlin docker containers in your ECS environment,
and verifying that you can run a CPU attack against the freshly installed Gremlin agents. In the verification steps
we will be creating a container to run htop
exposed as a web interface via port 8888
, which will allow us to
visualize changes in real time as a simple CPU attack is run against the container.
/var/lib/gremlin
.Additionally, to get the most from this installation guide you should already be familiar with running Gremlin as a container. You can reference Install Gremlin in a Docker Container for help getting started with with Gremlin and Docker.
my-team-id
, my-team-secret
, my-aws-account
. Additionally, review the Task Definition's limits and CPU architecture values to ensure they match your target environment.Task Definitions
the ECS service, and choose Create New Task Definition
EC2
for the launch type compatibility and click Next Step
Configure via JSON
Save
buttonClusters
in the ECS serviceServices
tab, click the Create
buttonOn the Configure service
page, set the parameters as follows:
Launch Type
compute option.EC2
launch type.Service
application tpye.Task Definition
-> Family
: gremlin
Task Definition
-> Revision
: latest
Service type
: DAEMON
Create
Clusters
in the ECS serviceServices
tab, you should now see the Gremlin
serviceDesired tasks
matches the number of ECS hosts in your clusterRunning tasks
matches the number of Desired tasks
. Note that it can take several minutes for the
ECS scheduler to launch Gremlin to full capacityGremlin
service is running at full capacity, navigate to https://app.gremlin.com/clients/infrastructureplatform=ecs
to verify that the Gremlin control plane can see the freshly launched
ECS daemonsContainers
tabThis will create a docker container that exposes htop via shellinaboxd on port 8888. htop is an interactive process viewer for Unix systems. We'll use htop as the target for an attack in step 8. Using htop isn't a requirement for installing Gremlin in Docker, but for this tutorial, using it makes it easier to see the impact of our attacks.
In the AWS management console navigate to Repositories
in the ECS service
If you don't already have a repository, click Get started
at the top; otherwise click New repository
In Repository name
type in htop
, then click Next step
Take note of the endpoint to push your docker image to, then click Done
SSH into an instance in your AWS environment with the AWS command line tools and docker installed (e.g a jump box)
Authenticate docker client against ECR: sudo $(aws ecr get-login --no-include-email --region us-east-1)
Create and change directory to ~/docker-htop
; mkdir -p ~/docker-htop; cd ~/docker-htop
Create the docker file
1cat <<< 'FROM alpine:latest2RUN apk --no-cache add --update htop && rm -rf /var/cache/apk/*3RUN apk --no-cache add --repository http://dl-cdn.alpinelinux.org/alpine/edge/testing shellinabox4ENTRYPOINT ["shellinaboxd", "-t", "-p8888", "-s/:nobody:nogroup:/:htop"]' > Dockerfile
Create the docker image sudo docker build -t htop .
Tag the image to push to the repository, you'll need the end point details from creating the repository
sudo docker tag htop:latest <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
Push the container to ECR, again you'll need the end point details from creating the repository
sudo docker push <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
In the AWS management console navigate to Task Definitions
the ECS service, and choose Create New Task Definition
Select EC2
for the launch type compatibility and click Next Step
On the Configure task and container definitions
page, set the parameters as follows:
Task Definition Name
: htop
Task Role
: Leave blank
Network Mode
: Leave as <default>
Task execution role
: Leave as none
Task memory (MiB)
: 128
Task CPU (unit)
: 128
Click Add container
, and in the Add container
modal enter the following information, leaving defaults unless otherwise specified:
Container name
: htop
Image
: <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
Private repository authentication
: Leave uncheckedMemory Limits (MiB)
: Hard limit
128
Port mappings
: Host port
: 8888
; Container port
: 8888
Docker Labels
section and enter appropriate key-value tags, at a minimum we suggest app:htop
Add
buttonScroll down and click the Create
button
In the AWS management console navigate to the Clusters
in the ECS service
Select the cluster you just to deployed Gremlin into
On the Services
tab, click the Create
button
On the Configure service
page, set the parameters as follows:
Launch type
: EC2
Task Definition
-> Family
: htop
Task Definition
-> Revision
: latest
Cluster
-> The cluster you wish to deploy intoService name
: htop
Service type
: REPLICA
Number of tasks
: 1Click Next step
to bring you to the Set Auto Scaling
page, and Next step
again
Review the service details to ensure accuracy, and if everything looks good click Create Service
In the AWS management console navigate to the Clusters
in the ECS service
Select the cluster you just to deployed Gremlin into
On the Services
tab, click the htop
service we just created
Click on the Tasks
tab
Click on the task-ID for the running HTOP task
Expand the htop container by clicking on the arrow next to the container name htop
In the Network bindings
section, click on the provided External Link
8888
Congratulations, you should now see the htop
interface in your web browser. Leave this open, as we'll be referring
back to it in the next step.
New Attack
Containers
tabDocker
key-value pair we added, app:htop
Choose a Gremlin
Resource
category and CPU
attackUnleash Gremlin
to launch the attackhtop
browser window that you can see the increased CPU load on the docker container.Now that you've ran a basic attack in ECS, there may be some advanced configuration that we want to make aware of:
networkMode
- This option determines which network space we would like to affect. In our example, we have it set to awsvpc
which means the task can only affect the awsvpc interface. Some other options are: host
, bridge
, or none
. For more information, please consult the AWS guide on Network mode.pidMode
- This parameter allows you to configure the container to share their process ID with either the host or other containers in the task. By default, the setting is not stated. It may prove useful when performing process killer attacks to set this parameter to host
. For more information, please consult the AWS guide on PID mode.You now have Gremlin up and running in your ECS environment, and validated its functionality against a running htop
container. For security, you should remove the htop
container from your running cluster, as it's an unsecured metric
view into your running environment.
Feel free to expand this to other ECS environments and have fun running Chaos Experiments!
The following task definition assumes Docker is the container driver.
If using containerd
/run/docker/runtime-runc/moby
with /run/containerd/runc/k8s.io
/var/run/docker.sock
with /run/containerd/containerd.sock
If using CRI-O
/run/docker/runtime-runc/moby
with /run/runc
/var/run/docker.sock
with /run/crio/crio.sock
1{2 "family": "gremlin",3 "containerDefinitions": [4 {5 "name": "gremlin",6 "image": "gremlin/gremlin:latest",7 "cpu": 0,8 "portMappings": [],9 "essential": true,10 "entryPoint": [11 "/entrypoint.sh"12 ],13 "command": [14 "daemon"15 ],16 "environment": [17 {18 "name": "GREMLIN_TEAM_SECRET",19 "value": "my-team-secret"20 },21 {22 "name": "GREMLIN_TEAM_ID",23 "value": "my-team-id"24 },25 {26 "name": "GREMLIN_CLIENT_TAGS",27 "value": "platform=ecs"28 }29 ],30 "mountPoints": [31 {32 "sourceVolume": "runtime-runc",33 "containerPath": "/run/docker/runtime-runc/moby",34 "readOnly": false35 },36 {37 "sourceVolume": "runtime-socket",38 "containerPath": "/var/run/docker.sock",39 "readOnly": false40 },41 {42 "sourceVolume": "cgroup-root",43 "containerPath": "/sys/fs/cgroup",44 "readOnly": false45 },46 {47 "sourceVolume": "gremlin-state",48 "containerPath": "/var/lib/gremlin",49 "readOnly": false50 },51 {52 "sourceVolume": "gremlin-logs",53 "containerPath": "/var/log/gremlin",54 "readOnly": false55 }56 ],57 "volumesFrom": [],58 "linuxParameters": {59 "capabilities": {60 "add": [61 "KILL",62 "NET_ADMIN",63 "SYS_BOOT",64 "SYS_TIME",65 "SYS_ADMIN",66 "SYS_PTRACE"67 ]68 }69 },70 "logConfiguration": {71 "logDriver": "awslogs",72 "options": {73 "awslogs-create-group": "true",74 "awslogs-group": "/ecs/gremlin",75 "awslogs-region": "us-west-1",76 "awslogs-stream-prefix": "ecs"77 }78 }79 }80 ],81 "executionRoleArn": "arn:aws:iam::my-aws-account:role/ecsTaskExecutionRole",82 "networkMode": "host",83 "volumes": [84 {85 "name": "runtime-runc",86 "host": {87 "sourcePath": "/run/docker/runtime-runc/moby"88 }89 },90 {91 "name": "runtime-socket",92 "host": {93 "sourcePath": "/var/run/docker.sock"94 }95 },96 {97 "name": "cgroup-root",98 "host": {99 "sourcePath": "/sys/fs/cgroup"100 }101 },102 {103 "name": "gremlin-state",104 "host": {105 "sourcePath": "/var/lib/gremlin"106 }107 },108 {109 "name": "gremlin-logs",110 "host": {111 "sourcePath": "/var/log/gremlin"112 }113 }114 ],115 "requiresCompatibilities": [116 "EC2"117 ],118 "cpu": "1024",119 "memory": "1024",120 "pidMode": "host",121 "runtimePlatform": {122 "cpuArchitecture": "X86_64",123 "operatingSystemFamily": "LINUX"124 },125 "tags": [126 {127 "key": "ecs:taskDefinition:createdFrom",128 "value": "ecs-console-v2"129 }130 ]131}
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started