Fault Injection > Viewing experiment execution details

Viewing experiment execution details

Supported platforms:

N/A

Sometimes, it's useful to understand the overall timeline of a Gremlin experiment as it applied to its targets. This can be useful for answering questions like:

Did any targets exit before the expected length of the experiment? If so, how many?
How soon into an experiment were impacts halted?
Did any targets not pick up the experiment as expected?

This document will show you how to view this timeline programatically using the Executions API.

‍

What is an execution?

An Execution is an individual instance of an experiment associated with exactly one target that was included in the experiment. For example, if you select a single host as the target of an experiment, then the experiment will have one execution. However, if you select a Kubernetes Deployment with four containers running in it, then that experiment will have four executions. Every experiment has as many executions as there are unique targets of the experiment.

These Executions are visible in Gremlin's UI when you look at an Experiment's details. Each row of an experiment's details page corresponds to an execution.

Two executions shown for a single experiment. Each execution targets a unique host.

‍

Viewing execuations via the Gremlin API

Using Gremlin's API, you can fetch executions for a given experiment using the Executions API. For example:

SHELL


EXPERIMENT_ID="9760bd93-b066-4386-bb08-e7544e32a024"
TEAM_ID="c46c15ac-2277-4c29-9c32-73010eb72177"
AUTHORIZATION="Bearer (token)"
curl 'https://api.gremlin.com/v1/executions?taskId=${EXPERIMENT_ID}&teamId=${TEAM_ID}' -H "Authorization: $AUTHORIZATION"

# Note API response is truncated for brevity
[ {
  "org_id" : "11111111-1111-1111-1111-111111111111",
  "guid" : "11111111-1111-1111-1111-111111111111",
  "target_type" : "Container",
  "service_states" : [ {
    "stage" : "Pending",
    "timestamp" : "2025-09-04T18:13:49.593Z"
  }, {
    "stage" : "Distributed",
    "timestamp" : "2025-09-04T18:13:58.040Z"
  }, {
    "stage" : "Initializing",
    "timestamp" : "2025-09-04T18:13:58.171Z"
  }, {
    "stage" : "Running",
    "timestamp" : "2025-09-04T18:13:58.213Z"
  }, {
    "stage" : "TearingDown",
    "timestamp" : "2025-09-04T18:14:58.276Z"
  }, {
    "stage" : "Successful",
    "timestamp" : "2025-09-04T18:14:58.334Z"
  } ],
  "client_id" : "ip-10-0-111-96.us-west-2.compute.internal",
  "client_status" : "HEALTHY",
  "client_version" : "2.60.2",
  "output" : "2025-09-04 18:13:58 UTC - Setting up cpu gremlin with guid '11111111-1111-1111-1111-111111111111' for 60 seconds on 1% of 1 core\n2025-09-04 18:13:58 UTC - Setup successfully completed\n2025-09-04 18:13:58 UTC - Running cpu gremlin with guid '11111111-1111-1111-1111-111111111111' for 60 seconds on 1% of 1 core\n2025-09-04 18:14:58 UTC - Attack on cpu_1 completed successfully\n2025-09-04 18:14:58 UTC - Begin to revert impact\n2025-09-04 18:14:58 UTC - Impact reverted\n",
  "guest_id" : "063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3",
  "create_source" : "WebApp",
  "attack_container_id" : "gremlin-063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3-72829663",
  "stage" : "Successful",
  "stage_lifecycle" : "Complete",
  "owning_team_id" : "11111111-1111-1111-1111-111111111111",
  "runas_user" : "root",
  "task_id" : "11111111-1111-1111-1111-111111111111",
  "kind" : "WebApp",
  "args" : [ "cpu", "-c", "1", "-p", "1", "--length", "60" ],
  "start_time" : "2025-09-04T18:13:58.171Z",
  "end_time" : "2025-09-04T18:14:58.334Z",
  "created_at" : "2025-09-04T18:13:49.593Z",
  "updated_at" : "2025-09-04T18:14:58.351Z"
}, ]

‍

There is a lot of useful information in this execution object. Fields like output (represents the live log of a running gremlin experiment) and error (represents any critical error that occurred during the experiment) are two important fields for assessing details of an experiment as it applied to a target. However, for a fine-grained timeline of a target's experience during an experiment, the service_states field is also valuable.

The following example shows how you can use various fields of each of an experiment's executions to create a timeline of events that occurred across targets of an experiment. The definition of these events can be found on the Experiments docs page.

SHELL


# Print the various states of an experiment's (${EXPERIMENT_ID}) executions, sorted reverse-chronologically
# - client_id: represents the host on which an execution ran
# - guest_id: represents the container on which an execution ran (if applicable)
# - service_states: represents a timeline of states an execution encountered
#   - See https://www.gremlin.com/docs/fault-injection-experiments#experiment-stages
curl 'https://api.gremlin.com/v1/executions?taskId=${EXPERIMENT_ID}&teamId=${TEAM_ID}' \
  -H "Authorization: $AUTHORIZATION" \
  | jq -r \
    '[["TIMESTAMP", "HOST", "CONTAINER", "STAGE"], 
     (.[] | { host: .client_id, container: .guest_id } as $t | .service_states[] | [ .timestamp, $t.host, $t.container, .stage ])][] | @tsv' \
  | column -t \
  | sort -r

TIMESTAMP                 HOST                                       CONTAINER                                                         STAGE
2025-09-04T18:14:58.334Z  ip-10-0-111-96.us-west-2.compute.internal  063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3  Successful
2025-09-04T18:14:58.276Z  ip-10-0-111-96.us-west-2.compute.internal  063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3  TearingDown
2025-09-04T18:14:51.550Z  ip-10-0-87-118.us-west-2.compute.internal  230f658bb539b50c19ac58a8935065f7143749f69ecd3ab6e5c3a4f5b6853685  Successful
2025-09-04T18:14:51.508Z  ip-10-0-87-118.us-west-2.compute.internal  230f658bb539b50c19ac58a8935065f7143749f69ecd3ab6e5c3a4f5b6853685  TearingDown
2025-09-04T18:13:58.213Z  ip-10-0-111-96.us-west-2.compute.internal  063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3  Running
2025-09-04T18:13:58.171Z  ip-10-0-111-96.us-west-2.compute.internal  063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3  Initializing
2025-09-04T18:13:58.040Z  ip-10-0-111-96.us-west-2.compute.internal  063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3  Distributed
2025-09-04T18:13:51.434Z  ip-10-0-87-118.us-west-2.compute.internal  230f658bb539b50c19ac58a8935065f7143749f69ecd3ab6e5c3a4f5b6853685  Running
2025-09-04T18:13:51.387Z  ip-10-0-87-118.us-west-2.compute.internal  230f658bb539b50c19ac58a8935065f7143749f69ecd3ab6e5c3a4f5b6853685  Initializing
2025-09-04T18:13:51.249Z  ip-10-0-87-118.us-west-2.compute.internal  230f658bb539b50c19ac58a8935065f7143749f69ecd3ab6e5c3a4f5b6853685  Distributed
2025-09-04T18:13:49.594Z  ip-10-0-87-118.us-west-2.compute.internal  230f658bb539b50c19ac58a8935065f7143749f69ecd3ab6e5c3a4f5b6853685  Pending
2025-09-04T18:13:49.593Z  ip-10-0-111-96.us-west-2.compute.internal  063b0b53cbc079617e86339ec108cca8784e6c965525948b603719974b4a71f3  Pending

Experiments

Targets