Managing the Gremlin Agent

The Gremlin Agent is an executable binary installed on a host operating system, container runtime, or Kubernetes cluster. It maintains a heartbeat connection to the Gremlin Control Plane to let Gremlin know that the host is active and able to receive orders, such as initiating a reliability test or injecting fault. The agent only requires an outbound network connection to the Gremlin Control Plane, letting you run it behind a firewall without opening inbound ports. All traffic is encrypted.

Agent lifecycle

When an agent is installed and authenticated, it appears as "Active" in the Agents list. It also identifies any targets for fault injection, such as hosts or containers.

You can only run experiments on "active" Gremlin Agents. An Agent goes into an "idle" state if the Gremlin Control Plane detects no activity for at least 5 minutes. You cannot run or schedule experiments on idle Agents. If Gremlin does not hear from these idle Agents for a period of 24 hours, the Agents are removed from the list. However, if an Agent starts communicating with Gremlin again while still within the 24 hour idle window, the Agent is reactivated and returned to the "active" state.


Logs can be found under the /var/log/gremlin directory. Agent logs can be found in the daemon.log file. Log entries in this file may indicate events where the Gremlin Agent is not able to communicate with the Control Plane.

Each fault injection performed by the Agent is logged under /var/log/gremlin/executions using its unique experiment execution ID. This is useful for troubleshooting experiments that do not complete.

Log size

To see how much disk space is being used by logs, run the du utility on the /var/log/gremlin directory:


du -sh /var/log/gremlin

Bandwidth usage

Idle state

The Gremlin Agent uses very little bandwidth in its idle state. In testing over a 5 minute period, the Agent sent a total of 11.3KB and received 24.8KB—an average combined bandwidth of 0.12KB/s.

Attack state

There is a slight increase in overall bandwidth consumption during experiments. While experiments are being executed, the Agent stays in constant communication with the Control Plane as it checks for the abort condition to be executed. The bandwidth used is not affected by the type of experiment being run. In testing over a 5 minute period, the Agent sent a total of 112.3KB and received 114.0KB—an average combined bandwidth of 0.75KB/s.

Process Collection

When Process Collection is enabled, the Gremlin Agent will send additional data and the bandwidth consumed will depend on how many processes are discovered. The information is gzip compressed in order to minimize network consumption. To measure the actual bandwidth consumed by Gremlin for your particular installation, we recommend using a tool such as iptraf or nethogs.

No items found.
This is some text inside of a div block.
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
Health Checks
Command Line Interface
Updating Gremlin
Reliability Management (RM) Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Container security
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Configuring Role Based Access Control (RBAC)
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Managing Users and Teams
Integration Agent for Linux
Test Suites
Restricting Testing Times
Process Exhaustion Experiment
Enabling DNS collection
Authenticating Users with Microsoft Entra ID (Azure Active Directory) via SAML
AWS Quick Start Guide
Installing Gremlin on Amazon ECS
Experiments Revamp