
This tutorial will walk you through installing Gremlin on Mesosphere using Marathon Container Orchestration, and testing the functionality with a CPU Chaos Engineering experiment.
Gremlin
application group in Marathon, and inside of that group creating an application
definition for each role in your Marathon configuration. In the case of this guide, we are assuming default roles of
private and public./var/lib/gremlin
.agent
and agent_public
, or
slave
and slave_public
This step will need to be repeated for each resource pool you wish to install the Gremlin
agent on.
Create Application
, located in the upper right hand corner.Create Application
, located on the lower right hand corner of the modal.The Swissknife container utilizes shellinabox to expose a couple tools to you the user, for the purpose of testing. This container should not be left running past the use of this tutorial, but we have found it to be very helpful in the pursuit of troubleshooting our container environments.
For more information on the Swissknife container, see Docker Hub
Create Application
, located in the upper right hand corner.Create Application
, located on the lower right hand corner of the modal.Let's get the htop
application open to give us real-time metrics to the running swissknife
container. As we run an example attack from gremlin
this will help us visualize what is happening to the container. The htop
application can be found at http://<PUBLIC-NODE-IP>:8888/htop
. Also available to you in this container is a root shell located at http://<PUBLIC-NODE-IP>:8888/
, a constant ping running to Google http://<PUBLIC-NODE-IP>:8888/gping
and a running iostat
located at http://<PUBLIC-NODE-IP>:8888/iostat
swissknife
application8888
slave_public
, you may need to change
this if that is not the public role you usehttp://<PUBLIC-NODE-IP>:8888/htop
Congratulations, you should now see the htop
interface in your web browser. Leave this open, as we'll be referring back to it in the next step.
New Attack
Containers
tabswissknife
container we created as your target by clicking the checkbox next to the container ID, you
should be able to find this based on the Docker
key-value pair we added, app:swissknife
Choose a Gremlin
Resource
category and CPU
attackUnleash Gremlin
to launch the attackhtop
browser window that you can see the increased CPU load on the docker container.You now have Gremlin up and running in your ECS environment, and validated it's functionality against a running swissknife
container. For security, you should remove the swissknife
container from your running cluster, as it's an unsecured metric view into your running environment.
Feel free to expand this to other Mesosphere environments and have fun running Chaos Experiments!
Gremlin
application definitions map the mount points /var/run/docker.sock
, /var/log/gremlin
and /var/lib/gremlin
to the same locations on the host nodesGREMLIN_TEAM_CERTIFICATE_OR_FILE
, GREMLIN_TEAM_PRIVATE_KEY_OR_FILE
and GREMLIN_CLIENT_TAGS
fields will
need to be updated to fit with your environment.1{2 "id": "/gremlin/gremlin-agent",3 "cmd": "/entrypoint.sh daemon",4 "cpus": 0.25,5 "mem": 64,6 "disk": 0,7 "instances": 2,8 "constraints": [["hostname", "UNIQUE"]],9 "acceptedResourceRoles": ["*"],10 "container": {11 "type": "DOCKER",12 "docker": {13 "forcePullImage": true,14 "image": "gremlin/gremlin",15 "parameters": [16 {17 "key": "cap-add",18 "value": "NET_ADMIN"19 },20 {21 "key": "cap-add",22 "value": "SYS_BOOT"23 },24 {25 "key": "cap-add",26 "value": "SYS_TIME"27 },28 {29 "key": "cap-add",30 "value": "KILL"31 }32 ],33 "privileged": true34 },35 "volumes": [36 {37 "containerPath": "/var/run/docker.sock",38 "hostPath": "/var/run/docker.sock",39 "mode": "RW"40 },41 {42 "containerPath": "/var/log/gremlin",43 "hostPath": "/var/log/gremlin",44 "mode": "RW"45 },46 {47 "containerPath": "/var/lib/gremlin",48 "hostPath": "/var/lib/gremlin",49 "mode": "RW"50 }51 ]52 },53 "env": {54 "GREMLIN_TEAM_ID": "<TEAM_ID_HASH>",55 "GREMLIN_TEAM_SECRET": "<TEAM_SECRET_HASH>",56 "GREMLIN_CLIENT_TAGS": "mesoscluster=<Your_Mesos_Cluster_Name>,owner=<Your_Name>,mesosrole=private"57 },58 "labels": {59 "name": "gremlin"60 },61 "portDefinitions": [62 {63 "port": 10000,64 "protocol": "tcp"65 }66 ]67}
1{2 "id": "/gremlin/gremlin-agent-public",3 "cmd": "/entrypoint.sh daemon",4 "cpus": 0.25,5 "mem": 64,6 "disk": 0,7 "instances": 2,8 "constraints": [["hostname", "UNIQUE"]],9 "acceptedResourceRoles": ["slave_public"],10 "container": {11 "type": "DOCKER",12 "docker": {13 "forcePullImage": true,14 "image": "gremlin/gremlin",15 "parameters": [16 {17 "key": "cap-add",18 "value": "NET_ADMIN"19 },20 {21 "key": "cap-add",22 "value": "SYS_BOOT"23 },24 {25 "key": "cap-add",26 "value": "SYS_TIME"27 },28 {29 "key": "cap-add",30 "value": "KILL"31 }32 ],33 "privileged": true34 },35 "volumes": [36 {37 "containerPath": "/var/run/docker.sock",38 "hostPath": "/var/run/docker.sock",39 "mode": "RW"40 },41 {42 "containerPath": "/var/log/gremlin",43 "hostPath": "/var/log/gremlin",44 "mode": "RW"45 },46 {47 "containerPath": "/var/lib/gremlin",48 "hostPath": "/var/lib/gremlin",49 "mode": "RW"50 }51 ]52 },53 "env": {54 "GREMLIN_TEAM_ID": "<TEAM_ID_HASH>",55 "GREMLIN_TEAM_SECRET": "<TEAM_SECRET_HASH>",56 "GREMLIN_CLIENT_TAGS": "mesoscluster=<Your_Mesos_Cluster_Name>,owner=<Your_Name>,mesosrole=public"57 },58 "labels": {59 "name": "gremlin"60 },61 "portDefinitions": [62 {63 "port": 10000,64 "protocol": "tcp"65 }66 ]67}
1{2 "id": "/swissknife",3 "cmd": null,4 "cpus": 0.25,5 "mem": 64,6 "disk": 0,7 "instances": 1,8 "constraints": [["hostname", "UNIQUE"]],9 "acceptedResourceRoles": ["slave_public"],10 "container": {11 "type": "DOCKER",12 "docker": {13 "forcePullImage": true,14 "image": "khultman/swissknife",15 "parameters": [],16 "privileged": false17 },18 "volumes": [],19 "portMappings": [20 {21 "containerPort": 8888,22 "hostPort": 8888,23 "labels": {},24 "name": "default",25 "protocol": "tcp",26 "servicePort": 888827 }28 ]29 },30 "labels": {31 "app": "swissknife"32 },33 "networks": [34 {35 "mode": "container/bridge"36 }37 ],38 "portDefinitions": []39}
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started