Dashboard
Security

Overview

To find an overview of Gremlin’s security practices, check out gremlin.com/security.

Gremlin makes it easy to find weaknesses in your system before they cause problems for your customers. Gremlin is a simple, safe, and secure way to use Chaos Engineering to improve system resilience.

Gremlin experiments are generated on the Control Plane. Gremlin Agents make outbound TLS calls to poll for experiments. Gremlin provides secure command execution, security auditing, multi-factor authentication (MFA), and SAML SSO.

Linux

Gremlin is installed on Linux with a least privilege setup. When installed directly on the host, Gremlin does not require root privileges to any machines in your infrastructure. Gremlin operations are run via a <span class="code-class-custom">gremlin</span> user created with default Linux privileges.

Gremlin needs the following Linux capabilities to perform the corresponding experiments.

capability purpose
cap_sys_boot used by shutdown to shutdown (and optionally reboot) your hosts
cap_sys_time used by time travel to move your hosts forward and backward through time
cap_net_admin used by the network gremlins for all network experiments
cap_kill used by process killer to kill requested process(es)
cap_sys_resource grant access to update resource limits for process exhaustion against containers

When targeting containers, Gremlin spawns its own sidecars to impact those containers so that you don't need to restart the targets. This is necessary so that the attack impacts the container target (eg. its virtual network, resource limits, etc) specifically. In order to do this Gremlin may require additional capabilies when running without elevented/root privileges. These are the additional capabilities:

capabilitypurpose
cap_setfcapthis allows Gremlin to pass the user id into the target container's namespace
cap_sys_chrootthis allows Gremlin to enter the target container's filespace for IO and Disk based experiments
cap_audit_writenecessary for communicating with the Kernel's audit log for containers spawned by containerd and cri-o
cap_mknodused to setup devices attached to a given container being targeted by Gremlin
cap_sys_adminneeded to enter the target container's process namespace (see: setns(2))
cap_dac_read_searchgrants us the ability to execute directories (list contents) without having access granted by the file owner/mode to obtain sockets for certificate expiry experiments
cap_sys_ptraceused by process collection to grant access to absolute path to process binary for hosts and container services, see proc(5) and ptrace(2)

Windows

The Gremlin daemon is installed as a Windows service under the LocalSystem account. Experiments created from the user interface run as a child process of the deamon so they too run under the LocalSystem account.

Gremlin configuration and work files are placed in the <span class="code-class-custom">%ALLUSERSPROFILE%\Gremlin\Agent</span> directory. By default Windows places that location at <span class="code-class-custom">C:\ProgramData\Gremlin\Agent</span> The Gremlin folders and files inherit permissions from the parent <span class="code-class-custom">%ALLUSERSPROFILE%/C:\ProgramData</span> folder. Normally the permissions are read-write for administrators and read-only for all others. Those permissions prevent non-administrators from being able to run experiments from the command line.

Gremlin agent includes a kernel driver. The kernel driver is used for latency experiments. Like the Gremlin daemon, the Gremlin kernel driver loads with the operating system.

Network Access

Gremlin never intercepts the content or payload of any network traffic. Gremlin only looks at routing information in order to apply its impact to the intended network traffic.

No Ingress ports required

The primary communication between Gremlin installations and the Gremlin Control Plane is handled by the Gremlin daemon. However, when targeting a container or Kubernetes pod Gremlin spawns a sidecar that communicates directly with the Gremlin control plane for the duration of the experiment. For this reason, the daemon and experiment targets (including containers and Kubernetes pods) must have an outbound network path to the Gremlin service (<span class="code-class-custom">api.gremlin.com</span>).

Proxy support

The Gremlin Agent supports http/https proxies via the environment variables <span class="code-class-custom">http_proxy</span> and <span class="code-class-custom">https_proxy</span>. These are set to use a proxy server via HTTP and HTTPS traffic, respectively. Values used should be of the form <span class="code-class-custom">http[s]://[username:password@]address:port</span>, such as <span class="code-class-custom">export https_proxy=https://proxy.your_company.com:8080 or export https_proxy=https://your_username:your_password@proxy.your_company.com:8080</span>.

For Linux, the Gremlin daemon, which is typically run as a service, requires these environment variables to be set in <span class="code-class-custom">/etc/default/gremlind</span>:

BASH

echo "https_proxy=https://localhost:8888" | sudo tee -a /etc/default/gremlind
sudo systemctl restart gremlind

For Windows the environment variables can be set through Control Panel or using PowerShell commands.

Note that the Gremlin Service only functions via encrypted communication (HTTPS). Attempts to connect to it via unencrypted protocols (HTTP) are denied.

Secure command execution

The Gremlin Daemon periodically communicates with our service over a TLS-protected channel which is authenticated using your organization's credentials. Once authenticated, the daemon sends heartbeat messages to the service and receives instructions from the service as responses to the heartbeat messages. If an experiment has been scheduled, the daemon receives the instructions for executing that experiment. Each instruction action is pre-defined within the daemon. Arbitrary instructions cannot be executed.

The service API only supports TLSv1.2 connections.

Security auditing

The Gremlin Agent, Daemon, API, and web app undergo regular security auditing, including penetration testing, by the external security auditor Bishop Fox. All identified vulnerabilities are remediated promptly and confirmed via remediation testing by our auditors. We can provide a Letter of Assessment from our auditors outlining our most recent audit findings and remediation results upon request.

Two Factor Authentication (MFA)

Gremlin offers Two Factor Authentication. See MFA under User Authentication.

SAML SSO

Gremlin supports SAML SSO. See SAML under User Authentication.

No items found.
Next
Previous
This is some text inside of a div block.
Compatibility
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Integrations
Health Checks
Notifications
Command Line Interface
Updating Gremlin
Reliability Management (RM) Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Targets
Experiments
Scenarios
GameDays
Overview
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Examples
Container security
General
Linux
Windows
Chao
Helm
Glossary
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Jira
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Overview
Overview
Overview
Overview
Overview
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Configuring Role Based Access Control (RBAC)
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Slack
Managing Teams
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Managing Users
Webhooks
Integration Agent for Linux
Test Suites
Restricting Testing Times
Reports
Process Exhaustion Experiment
Enabling DNS collection
Authenticating Users with Microsoft Entra ID (Azure Active Directory) via SAML
AWS Quick Start Guide