Release Notes


July 18, 2024

Note: this version contains important fixes for container targeting in both 2.50.0 and 2.51.0 releases.

  • Fix Fixed a bug released in 2.50.0 which resulted in the Gremlin agent reporting false rollback failures against container targets. The Gremlin agent can sometimes get stuck trying to rollback experiments that have already been cleaned up, leading to excessive log messages and a failure to receive new experiments.
  • Fix Fixed a bug released in 2.51.0 where network experiments against containers may attempt to target devices present on the host. Such experiments fail before they fully initialize and no impact to the host devices is made.
  • Fix Fixed a bug released in 2.51.0 where network experiments against containers fail to detect if conflicting network traffic shapping is present. Experiments fail to initialize as a result. No impact is made to conflicting traffic shaping rules.
  • Fix Fixed a bug released in 2.51.0 where memory experiments against containers use the host's total memory when calculating desired consumption, leading to experiments that may consume more than the desired amount when the --percent argument is passed.
  • Info Improved logging around actions the Gremlin agent takes to rollback active experiments.
2.51.0 (Removed)
July 11, 2024

Note: this version was removed from Docker Hub after identifying bugs that impacted network and memory container attacks. A patch version will be released as a replacement

  • New Blackhole experiments now automatically add the route to the Gremlin service to the exclusion rules so that connection is maintained during the attack. This can be disabled if needed via --no-derived-exclusion-rules.
  • Info Updated dependencies
June 27, 2024
  • Fix Fixed bugs with the Gremlin Agent's session renewal process, cleaning up spurrious Unauthorized errors and preventing rare instances where local session storage can get corrupted.
  • Fix The *-linux container Drivers now validate whether the the Gremlin Agent was installed in the host's PID namespace (e.g. using the gremlin.hostPID=true helm chart argument).
  • Info Updated dependencies
June 18, 2024
  • New The Gremlin Agent no longer has the collect_processes option. Setting this value to true is now ignored. Dependency discovery features are now controlled only by collect_dns, which is true by default.
  • Fix Upon receiving a shutdown signal (e.g. SIGTERM), the Gremlin Agent will wait for any running attacks to finish halting before shutting down. This fixes an issue where attacks would end up Failed or LostCommunication instead of ClientAborted when the Gremlin Agent was terminated during such attacks.
  • Info Updated dependencies
June 6, 2024
  • Fix Enhanced logging
  • Info Updated dependencies
May 23, 2024
  • New The Gremlin Agent now reports the DNS servers that are used by the host on which it is installed. This data is used to select a random DNS server to impact for our new Redundancy: DNS reliability test, available from the Well-Architected Cloud Test Suite.
  • Fix Enabled targeting of container paths for the IO attack
  • Fix Improved the read I/O attack to bypass the page cache and read directly from disk
  • Fix Enhanced logging
May 1, 2024
  • Fix Added missing capabilities on host installations for running container attacks: SYS_ADMIN, SYS_RESOURCE, CAP_SYS_CHROOT
  • Fix Suppressed dependency discovery log events that were too noisy. Events would occur when processing DNS traffic from containers that have since exited.
  • Fix Improved logging in situations where gremlind cannot open or parse configuration and certificate files.
  • Info Updated dependencies
April 3, 2024
  • Fix Critical Fix a bug introduced in 2.44.0 where Blackhole attacks failed to clean up impact on ingress traffic during a Halt or ClientAborted event. All users on affected versions are advised to upgrade as soon as possible to avoid any impact left behind from Gremlin attacks.
  • FixPrint details about eligible container drivers that failed to load due to missing requirements.
April 1, 2024
  • FixResolve to IPv4 addresses over IPv6 addresses during cert expiry experiments.
  • FixPrint full error when unable to inspect a network device.
March 20, 2024
  • NewAdd support for targeting by zone and region based on the Kubernetes labels topology.kubernetes.io/zone and topology.kubernetes.io/region
March 8, 2024
  • New Gremlin Container Drivers now support the CRI API version v1
March 5, 2024
  • NewIntroducing experiment Process Exhaustion, a way to consume processes to identify limits within the target system.
March 1, 2024
  • InfoUpdated dependencies.
March 4, 2024
  • New New container drivers are available: docker-linux, containerd-linux, crio-linux, which spawn attacks with significantly reduced CPU and IO system usage. Attacks against container processes no longer require direct integration with runc. These drivers can be enabled by removing volumeMounts from the Gremlin daemonset for /run/docker/runtime-runc/moby, /run/containerd/runc/k8s.io, and /run/runc respectively.
  • Fix Rolling back network attacks no longer considers missing network devices as a critical error. This accounts for failure modes where the network device is torn down externally.
  • Fix Better detection around pre-existing ingress rules which conflict with Gremlin blackhole attacks. This can happen with network integrations such as Cilium and Kata, or any networking integration which applies some level of traffic shapping on ingress network traffic. Gremlin now skips impact when conflicts are detected and prints a warning to the attack log.
February 28, 2024
  • NewDuring a rollback, the gremlind process sends a SIGTERM to the associated attack process before proceeding to clean up any remaining impact.
  • InfoRemoved --attacker and --target arguments from gremlin rollback-container, the target container can still be supplied as the first argument (e.g. gremlin rollback-container $TARGET_ID).
  • InfoImproved logging in daemon.log when attacks are rolled back.
  • InfoRaised the TCP connect timeout for API requests that transition attacks between stages from 1 second to 5 seconds.
February 27, 2024
  • InfoEnabled DNS collection by default, disabled process collection by default.
February 21, 2024
  •    InfoUpdated dependencies.  
February 15, 2024
  •    FixAddressed an issue where rollback would fail when no teardown was required.  
February 14, 2024
  •    FixFixed a regression introduced in 2.38.0 that prevented automatic rollback of attacks when the Gremlin agent loses connection with its control plane.  
  •    InfoRemoved dependency on the system pgrep utility during Process Killer attacks. Gremlin now identifies processes directly.  
  •    InfoA warning when is now emitted when /proc/sysrq-trigger is mounted in installations of the `gremlin/gremlin` and a shutdown attack is run. Install the gremlin agent container into the host's PID namespace instead to initiate a host-level shutdown.  
  •  InfoUpdated dependencies.  
February 7, 2024
  • NewAdded a new DNS-based dependency collection feature. Learn more about this feature here.
  • NewAdded CAP_NET_RAW capability for systemd installs
  • FixPrint full error on rollback failures.
  • InfoUpdated dependencies.
January 23, 2024
  • NewBetter error messages for no container driver error messages that can occur during container attacks if the underlying container runtime becomes unreachable. Error messages now include the failures received from each container runtime for which a connection was attempted.
  • FixFixed a bug where Gremlin would sometimes choose the wrong container driver when multiple container runtimes are present, resulting in failed attacks that indicate the targeted container no longer exists.
  • FixRemoved the file decompression steps that were introduced in 2.39.0 due to the memory overhead this optimization introduced. A future release will optimize container attack provisioning to a more significant degree.
  • FixFixed an incomplete error message when the gremlind process receives API errors from AWS IMDSv2 endpoints.
  • InfoUpdated dependencies.
December 8, 2023
  • New File system resources for Gremlin container attacks are decompressed on startup of the gremlind agent, which reduces gremlind's CPU usage at attack time.
December 7, 2023
  • New Provided Gremlin has access to a valid AWS credentials chain, it now interprets AWS ARN values in GREMLIN_TEAM_ID, GREMLIN_TEAM_SECRET, GREMLIN_TEAM_CERTIFICATE_OR_FILE, GREMLIN_TEAM_PRIVATE_KEY_OR_FILE. Gremlin supports ARN values from AWS Secrets Manager or AWS Systems Manager Parameter Store. Gremlin can optionally be supplied with GREMLIN_IAM_ROLE to specify a role to assume for the strict purpose of fetching secret values.
  • Fix More context is added to various error messages
  • Fix Regression introduced in 2.37.0 where attacks with invalid arguments would end up Lost Communication instead of Failed
  • Info Updated dependencies
November 28, 2023
  • Fix Fixed a bug where Gremlin would prevent sending arbitrary signals to PID 1. Now, only SIGKILL is prevented, which is unsupported against PID 1 on Linux.
November 27, 2023
November 14, 2023
  • New Attacks can sometimes fail to notify the Gremlin Control Plane when its connection is impacted by the attack itself. The Gremlin agent now tolerates these failures more often and attempts to resend failed notifications. This fixes attacks that end up in the HaltFaled stage that would otherwise finish in the Successful stage.
November 9, 2023
  • Fix Fixed an issue Certificate Expiry attacks against containers would fail when Gremlin was configured with SSL_CERT_FILE
November 7, 2023
  • Fix Fixed an issue where important errors from container attacks were not properly forwarded to the Gremlin control plane, leaving execution outputs from failed attacks without helpful troubleshooting information.
  • New Improved the output of the Gremlin Agent validation routine that happens on startup. When validation fails, details about the failure are written to daemon.log
  • Info Updated dependencies
October 20, 2023
  • Fix Fixed an issue where attacks were incorrectly labeled HaltFailed when Gremlin fails to notify api.gremlin.com during teardown of the network impact.
  • Fix Fixed a class of issues where Gremlin would not retry requests that failed with transient network errors. This sometimes lead to failing container attacks that should otherwise succeed.
  • New For users running Gremlin in the Docker container runtime, rollbacks against container targets no longer require provisioning a second container instance, which results in faster rollbacks.
  • New Gremlin provides more context to errors stemming from failed http requests to api.gremlin.com.
  • New For users running Gremlin on AWS, more error information is printed to the log file when AWS metadata cannot be retrieved.
  • Info Updated dependencies
October 16, 2023
  • Fix Fixed an issue where the Gremlin agent would ignore changes to the identifier field in config.yaml if a valid session has already been generated and is not yet expired. On startup, the Gremlin agent will now correctly regenerate a session using the intended identifier value if it detects that its existing session belongs to a different value for identifier.
October 12, 2023
  • Fix Fixed an issue with Certificate Expiry experiments against container targets, where the attack process would not have sufficient Linux capabilities (missing DAC_READ_SEARCH). This fix requires helm chart release 0.11.0 (See #86), however all other attacks will continue to work correctly without this chart update.
  • Fix Updated Certificate Expiry experiments to discover IPv4-mapped IPv6 addresses (e.g., ::FFFF: when a CIDR is specified.
  • Fix Fixed a regression introduced in 2.22.1 where the Process Killer experiment would incorrectly interpret the interval argument as milliseconds, instead of seconds as intended.
  • Info Updated dependencies
September 18, 2023
  • New Running Certificate Expiry experiments against CIDR values (e.g., will make several attempts to find an active IP address in use by the target system for evaluating certificate expiration characteristics within the duration specified by the argument --length.
September 8, 2023
  • New When installed directly on the host and launched with SystemD, Gremlin agent now runs with ambient capabilities (capabilities(7)). File capabilities are no longer set on /usr/bin/gremlin or /usr/sbin/gremlind.
  • New When installed directly on the host, the suid bit is no longer set for installed binaries /usr/bin/gremlin and /usr/sbin/gremlind. Additionally, these binaries are no longer owned by the gremlin linux user, but owned by root instead.
  • Info To install Gremlin with file capabilities and gremlin Linux user ownership in accordance with previous Gremlin versions, set the appropriate GREMLIN_INSTALL_ configuration variables at install time: GREMLIN_INSTALL_USER=gremlin GREMLIN_INSTALL_GROUP=gremlin GREMLIN_INSTALL_BIN_MODE=6111 GREMLIN_INSTALL_BIN_CAPABILITIES=1 sudo -E yum install gremlin gremlind. See Customize Gremlin's Linux User and Group
August 18, 2023
  • New Previously, gremlind would emit snapshots of process and socket data to Gremlin's control plane over 2 minute intervals. This release significantly reduces network overhead for this data as gremlind now batches up process data over 15 minute intervals, deduplicating all network and process data detected over this interval.
August 3, 2023
  • New Gremlin now uploads discovered process data at a slower rate, reducing network overhead.
July 12, 2023
  • Fix Fixed a regression released in 2.31.0 where the gremlin agent would set the Host header to an incorrect value for outgoing requests to the Gremlin control plane. This can lead to authentication failures for some intermediate web proxies that use this host header for authorizing requests.
July 7, 2023
  • Fix Errors related to spawning subprocesses now have more detailed information useful for troubleshooting.
  • Fix IO Errors related to Gremlin container attacks now have more detailed information useful for troubleshooting.
  • Fix Gremlin provisions fewer file resources for its attack sidecar processes, reducing the time it takes to launch container attacks.
June 29, 2023
  • Fix For hostnames supplied to network attacks, Gremlin delegates DNS queries to the operating system. When this query fails, Gremlin now attemps to resolve the name completely within the running process in an attempt to overcome operating system failures. This allows Gremlin network attacks to continue in the face of failed DNS processing.
  • Fix Fixed a comment in Gremlin's config.yaml which incorrectly stated that collect_processes was disabled by default.
June 27, 2023
  • Fix Fixed an out-of-memory error caused by a 3rd party library during process collection.
June 9, 2023
  • Fix Fixed a regression instoduced in 2.29.0 where containers for each attack execution incorrectly bind-mounted the file system of every other attack container running on the host. Given enough attack executions running at the same time, a new attack execution container receives a no space left on device error when attempting such mounts, despite space available. Gremlin no longer makes such mounts.
  • Fix When running the gremlin/gremlin container image, attack containers no longer run in the hostPath mount /var/lib/gremlin. This would produce permission denied errors on systems where this file system is mounted with the noexec flag, such as GKE COS
June 6, 2023
  • New The Certificate Expiry attack's ipaddress argument now correctly processes CIDR values (e.g. When passed, Gremlin will attempt to find an active IP Address in use by the target system and use it for evaluating certificate expiration characteristics.
  • New The gremlin/gremlin Dockerhub image now contains the strace utility as a convenience for operators that cannot install this utility from the internet.
May 23, 2023
  • New The Blackhole attack skips impact on ingress traffic when it detects third-party ingress traffic manipulation rules, such as those installed by a CNI like cilium. This allows egress impact to be applied without failing the attack with errors like Exclusivity flag on, cannot modify.
  • Info Updated dependencies.
May 18, 2023
  • Fix Gremlin's calls to getaddrinfo now fallback to TCP when a nameserver replies with a truncated answer. For more info, see musl libc 1.2.4.
  • Info Updated dependencies.
May 10, 2023
  • Fix Gremlin now tears down the TCP connection pool with api.gremlin.com after successive timeout failures.
  • Fix Gremlin includes the name of the targeted network interface in execution log events related to applying network impact.
  • Info Updated dependencies.
May 1, 2023
  • Fix Fixed an issue where Gremlin would not report back to the control plane the detailed error that occurred during a failed attack. Users encountering this bug may see http: 415: 415 in their execution log.
  • Fix Fixed an issue where gremlin check api would incorrectly report connection failures, including an error message of http 403.
  • Fix Fixed several instances where errors were suppressed from http interactions made by the Gremlin agent. All failed http interactions now show the method and path of the attempted call, along with descriptive error messages.
April 27, 2023
  • Fix Fixed an issue where Gremlin would run requested attack executions in a way that was detached from the original attack request. This leads to the original attack request ending in a LostCommunication stage, while the detached attacks continue to run.
April 25, 2023
  • Fix Corrected the ExecStartPre option in the gremlind.service file which resulted in nuisance errors.
  • Info Updated dependencies.
March 29, 2023
  • Fix Fixed a bug introduced in 2.31.0 where gremlin init would fail unless the environment variable GREMLIN_TRANSPORT=direct was set.
  • Info Added support for tag values to be any simple YAML datatype (boolean, integer, float, string). Previously only strings were supported.
  • Info Updated dependencies.
March 24, 2023
  • New Gremlin can now target container and Kubernetes targets, even when those targets lack network access to api.gremlin.com.
  • New All network traffic from Gremlin attack processes are now routed through /var/lib/gremlin/gremlin.sock. To disable this behavior, provide the following environment variable to the Gremlin agent: GREMLIN_TRANSPORT=direct
March 23, 2023
  • Fix Fixed an issue that prevented Gremlin from ingesting Azure Tags.
  • Fix Fixed an issue that made Gremlin validation unreliable.
  • Info Updated dependencies.
March 15, 2023
  • Fix Addressed an issue where Gremlin agents enabled with GREMLIN_TEAM_SECRET would fail to start when also configured with GREMLIN_TRANSPORT=domain-socket
  • New Gremlin's version command now prints more build information.
March 8, 2023
March 6, 2023
  • Fix Addressed performance issues that were seen with gremlind when collect_processes=true which would lead to high CPU usage and agents becoming IDLE. Symptoms occurred on systems running many processes and active network connections (over 1K of each).
  • New Various metrics around data collection have beed added to the output of gremlin check daemon for benchmarking purposes.
  • New A warning is now supplied in execution logs when the device argument specifies a device that does not exist.
February 16, 2023
  • Fix Fixed a regression in 2.30.0 in which network attacks running in a container without targeting a specific network interface failed to have any impact.
  • New Improved the strategy for selecting the target network interfaces.
February 10, 2023
  • New Multiple network interface attacks are now supported. Details are available in Network device selection.
  • New IP address and network interface data is collected to improve distributed network attacks.
  • Info Updated dependencies.
February 8, 2023
  • New Gremlin Container attacks no longer create a new Linux mount namespace for the attack. Instead, gremlin attack processes now run in the namespace of the gremlind agent. For Kubernetes environments running AppArmor, this release requires a helm chart update.
  • Info Updated dependencies.
January 5, 2023
  • Fix Fix a bug in collect_certs when the target dropped the network connection before completing the TLS setup.
  • Info Updated help URLs.
  • Info Updated dependencies.
December 8, 2022
  • New Add support for containerd builds that do not provide versioning metadata.
  • Info Updated dependencies.
November 22, 2022
  • Fix Fix a bug that prevented collect_certs from working when run against a container.
  • Info Updated dependencies.
November 21, 2022
  • New Add a short argument (-n) for the not_less_than option.
  • Info Updated dependencies.
November 18, 2022
  • Fix Fixed an issue affecting Docker CRI on cgroupv2; Gremlin previously failed to roll back network attacks if the target container was killed during the attack.
November 17, 2022
  • New Gremlin now supports OpenShift 4.9+ and CRI-O 1.22+
  • Fix Fixed an issue affecting containerd and CRI-O on cgroupv2; Gremlin previously failed to roll back network attacks if the target container was killed during the attack.
  • Fix Fixed an issue where Gremlin was not resolving internal hostnames in some instances.
November 16, 2022
  • New Introduce Certificate Expiry test for Reliability Management.
  • Info Updated dependencies.
October 28, 2022
  • New Agent interactions with AWS APIs now use IMDSv2.
  • Fix Fixed a bug where Gremlin would not properly launch attacks that resolve to a large amount of IP addresses / blocks.
October 27, 2022
  • Info All Gremlin container drivers now work with cgroup2-enabled kernels.
  • Info Updated dependencies.
October 6, 2022
  • Info Updated dependencies.
September 16, 2022
  • Info Process Collection is now automatically enabled. Process Collection gathers information about the processes running on Linux machines where the Gremlin Agent is installed to detect system dependencies. To disable Process Collection, see Disable Process Collection.
September 13, 2022
  • Info Updated dependencies.
August 31, 2022
  • Fix Fixed a bug where Gremlin's dependency discovery features would not work when IPv6 was disabled.
  • Fix Fixed a bug where Gremlin would not properly include swap in free memory calculations, leading to incorrect attack results.
August 26, 2022
  • Info Updated dependencies.
August 16, 2022
  • Fix: Fixed a bug where Gremlin hides informative warnings about its failure to capture dependency discovery data. Now, gremlind logs WARN messages when it fails to find socket data for any given process. Logs are written only once upon first occurrence.
This is some text inside of a div block.
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
Health Checks
Command Line Interface
Updating Gremlin
Reliability Management (RM) Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Container security
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Configuring Role Based Access Control (RBAC)
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Managing Users and Teams
Integration Agent for Linux
Test Suites
Restricting Testing Times
Process Exhaustion Experiment
Enabling DNS collection
Authenticating Users with Microsoft Entra ID (Azure Active Directory) via SAML
AWS Quick Start Guide
Installing Gremlin on Amazon ECS
Experiments Revamp