Getting Started

Additional Configuration for Helm

Some environments require additional configuration. Review the following sections to find the best configuration for your environment and then verify your installation.

Declare container driver

Gremlin has many different drivers for integrating with the underlying container runtime powering Kubernetes, as shown in the following table. Gremlin automatically chooses the most suitable driver based on associated requirements.

When using Helm, you can declare the intended container driver with the following:


--set gremlin.container.driver=$driver

Driver Requirements and file access Notes
containerd-linux Connect: /run/containerd/containerd.sock Used with containerd container runtime
Write: host's cgroup root
Access to the host's PID namespace
crio-linux Connect: /run/crio/crio.sock Used with Cri-O container runtime
Write: host's cgroup root
Access to the host's PID namespace
docker-linux Connect: /var/run/docker.sock Recommended for Docker runtime
Write: host's cgroup root
Access to the host's PID namespace
Minimum Docker version: 17.11.0
containerd-runc Connect: /run/containerd/containerd.sock Used with containerd container runtime
Write: /run/containerd/runc/k8s.io
Write: host's cgroup root
Access to the host's PID namespace
crio-runc Connect: /run/crio/crio.sock Used with Cri-O container runtime
Write: /run/runc
Write: host's cgroup root
Access to the host's PID namespace
docker-runc Connect: /var/run/docker.sock
Write: /run/docker/runtime-runc/moby
Write: host's cgroup root
Access to the host's PID namespace
Minimum Docker version: 17.11.0
docker Connect: /var/run/docker.sock No support for systemd cgroup driver

Enable Gremlin on the Kubernetes Master

Most Kubernetes deployments configure master nodes with the <span class="code-class-custom">node-role.kubernetes.io/master:NoSchedule</span> taint. You can run the following command to see if any of your nodes have this taint:


kubectl get no -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

kube-01   [map[effect:NoSchedule key:node-role.kubernetes.io/master]]
kube-02   <none>

To install Gremlin on a Kubernetes master that has been tainted, add those tolerations to your Helm arguments:

--set tolerations\[0\].effect=NoSchedule \
--set tolerations\[0\].key=node-role.kubernetes.io/master \ 
--set tolerations\[0\].operator=Exists

Add AppArmor support

If your cluster has AppArmor enabled (for example, Azure Kubernetes Service), add the following line to your Helm deployment to allow the Gremlin container to run without a security profile:

--set gremlin.apparmor=unconfined

Use a PodSecurityPolicy

Gremlin does not support running within the <span class="code-class-custom"> restricted </span> PodSecurityPolicy (PSP) that is configured by default on clusters that enable such policies. You can install a <span class="code-class-custom"> gremlin </span> PodSecurityPolicy to grant <span class="code-class-custom"> chao </span> and <span class="code-class-custom"> gremlin </span> everything they need, and nothing they don't need. When installing Gremlin with Helm, you can supply <span class="code-class-custom"> --set gremlin.podSecurity.podSecurityPolicy.create=true </span> to install Gremlin's custom pod security policies. Check out Gremlin's Helm Chart Repository for full documentation and usage.

Use a custom seccomp policy

All Gremlin behavior works under Docker's default seccomp policy. However some environments use seccomp profiles that are more restrictive, and prevent Gremlin behavior when using their default seccomp profile.

Gremlin has a custom seccomp profile which can be automatically installed when you install with Helm and pass <span class="code-class-custom"> --set gremlin.podSecurity.seccomp.enabled=true </span>. Check out Gremlin's Helm Chart Repository for full documentation and usage.

Configure a proxy

Both Gremlin and Chao can be configured to use a proxy for outgoing HTTP traffic. The conventional <span class="code-class-custom">https_proxy</span> and <span class="code-class-custom">no_proxy</span> variables can be passed as environment variables for this purpose.

When installing Gremlin with Helm, the proxy configuration is automated. See examples.

When experiments are done on Kubernetes resources, the proxy settings are carried along. Ensure your configuration will allow the gremlin sidecar to continue talking with the Attack Console for the duration of the experiment while it is attached to the target resource.

Proxy certificate authorities

When proxies support HTTPS communication, or are otherwise configured with a TLS certificate, it can be necessary to configure Gremlin to trust the proxy's certificate authority. This is done by passing the <span class="code-class-custom">SSL_CERT_FILE</span> environment variable where the value is a path on the file system to a PEM encoded file containing the certificate authority's certificate.

Configuring Gremlin


- name: https_proxy
  value: http://proxy.local:3128
# Pass SSL_CERT_FILE when the proxy requires trusting a TLS certificate
  value: /etc/gremlin/ssl/proxy-ca.pem

Configuring the Gremlin Kubernetes Agent

Because the Gremlin Kubernetes Agent (Chao) communicates with the local Kubernetes ApiServer in addition to the internet, it's important to bypass internet proxies for traffic bound to <span class="code-class-custom"> apiserver </span>


- name: https_proxy
  value: http://proxy.local:3128
- name: no_proxy
# Pass SSL_CERT_FILE when the proxy requires trusting a TLS certificate
  value: /etc/gremlin/ssl/proxy-ca.pem
# Pass SSL_CERT_DIR when SSL_CERT_FILE contains only the proxy certificate. This will ensure Chao trusts api.gremlin.com
# The value of SSL_CERT_DIR varies depending on the operating system on which the cluster hosts run
# See /docs/infrastructure-layer/kubernetes/#ssl_cert_dir
- name: SSL_CERT_DIR
  value: /etc/ssl/


Supplying <span class="code-class-custom">SSL_CERT_DIR</span> ensures Chao is still configured with the necessary certificate authories to trust <span class="code-class-custom">api.gremlin.com</span>. However it is not needed for most Gremlin installations because Chao will trust Gremlin servers by default. This variable is only required for Chao deployments when both of the following conditions are true:

  • Chao is configured with <span class="code-class-custom">https_proxy</span> and this proxy is configured to accept TLS connections
  • Chao is also configured with <span class="code-class-custom">SSL_CERT_FILE</span>, and the file it points to contains only the certificate authority for the https proxy

The value of <span class="code-class-custom">SSL_CERT_DIR</span> should point to the root of the certificate authority directory for the operating system on which Chao runs.

/etc/pki/tls/Fedora/RHEL 6/OpenELEC
/etc/ssl/OpenSUSE / Alpine Linux
/etc/pki/ca-trust/extracted/pem/CentOS/RHEL 7

Share namespaces

With the Gremlin Kubernetes Agent installed on your cluster you can share individual namespaces with other Gremlin teams. Once installed head to the Agents list in the Gremlin web app to view all of the clusters installed across your company.

By sharing individual namespaces to teams across your company, you can provide access for users to run experiments only on relevant services while also limiting access to the hosts or nodes themselves.

Managing Cluster access

As the <span class="code-class-custom">Team Manager</span> on a team where a Kubernetes cluster is installed or as a <span class="code-class-custom">Company Manager</span>, you can click the gear icon to manage access. On the cluster view, to share a <span class="code-class-custom">namespace</span> with a team use the search box to filter down the list of available <span class="code-class-custom">teams</span>. Then use the search box on the team row and click on the <span class="code-class-custom">namespace</span> you'd like to share. Use the options menu to share all of the namespaces.

To remove access of a namespace to a team, click on the x on the blue namesapce bubbles. Using the options menu you can also remove all namespaces at once.

Requesting Namespace access

As a member of a different team of your company, you can view the list of clusters installed across your company. To request access to a namespace on one of these clusters not installed on your team, click the <span class="code-class-custom">Request Access</span> button. You can then check off the namespaces you'd like access to, or you can use the select all switch.

You can also request access to a namespace within a cluster when creating an experiment. Once you've selected a cluster, the drop down list of namespaces will have an option to request access.

Approving access requests

To approve or deny an access request, you must be a <span class="code-class-custom">Team Manager</span>. Navigate to the Agents list, locate your cluster, and click the gear icon to the right. On the Manage Cluster Access page, you'll see any pending requests from your company.

Adding annotations to service account

You can use the advanced feature of adding custom annotations to the gremlin and chao service accounts in Kubernetes.For example, this can be used to tag the gremlin or chao service accounts with an AWS EKS IAM role when running:


helm install gremlin gremlin/gremlin \
     --namespace gremlin \
     --set      gremlin.serviceAccount.annotations."eks\.amazonaws\.com\/role-arn"="arn:aws:iam::123412341234:role/K8sServiceAccountRole" \
     --set      chao.serviceAccount.annotations."eks\.amazonaws\.com\/role-arn"="arn:aws:iam::123412341234:role/ChaoK8sServiceAccountRole"

Adding environment variables

You can use the advanced feature of adding custom environment variables to gremlin and chao services.


helm install gremlin gremlin/gremlin \
     --namespace gremlin \
     --set      'gremlin.extraEnv[0].name=TEST1' \
     --set      'gremlin.extraEnv[0].value=hello'

Sets environment variable TEST1 to "hello".

Adding labels to Gremlin pods

You can add custom labels to your Gremlin pods and assign values to them by using gremlin.podLabels. To set additional labels, simply add a new line:

--set gremlin.podLabels.label1=value1 \
--set gremlin.podLabels.label2=value2

Fault-level privileges

Access to experiments is controlled in two ways. The EXPERIMENTS_READ, EXPERIMENTS_WRITE, and EXPERIMENTS_RUN privileges can be leveraged to control access to all experiments as a single unit.

For finer-grained access control, the individual FAULT privileges can be leveraged to control access to specific experiment types. See the descriptions for each experiment to determine the correct FAULT privilege.

Privilege assignment is always additive. If a user does not have FAULT_CPU, but has EXPERIMENTS_RUN, the user will be allowed to run CPU experiments.
No items found.
This is some text inside of a div block.
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
Health Checks
Command Line Interface
Updating Gremlin
Reliability Management (RM) Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Container security
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Configuring Role Based Access Control (RBAC)
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Managing Users and Teams
Integration Agent for Linux
Test Suites
Restricting Testing Times
Process Exhaustion Experiment
Enabling DNS collection
Authenticating Users with Microsoft Entra ID (Azure Active Directory) via SAML
AWS Quick Start Guide
Installing Gremlin on Amazon ECS
Experiments Revamp