Dashboard
Failure Flags

Running Failure Flags experiments

This document will walk you through running your first experiment using Failure Flags.

Example: the HTTPHandler application

Throughout this document, we'll demonstrate examples use a simple application called "HTTPHandler". This application takes incoming web requests and returns the application's execution time. We'll provide examples for each of the available SDKs.

We've added a Failure Flag named <span class="code-class-custom">http-ingress</span> with two labels: one that tracks the request method, and one that tracks the URL path:

Node.js example

JS

const gremlin = require('@gremlin/failure-flags')

module.exports.handler = async (event) => {
  start = Date.now()

  // If there is an experiment defined for this failure-flag, that is also
  // targeting the HTTP method and or path then this will express the
  // effects it describes.
  await gremlin.invokeFailureFlag({
    name: 'http-ingress',
    labels: {
      method: event.requestContext.http.method,
      path: event.requestContext.http.path,
    },
  })

  return {
    statusCode: 200,
    body: JSON.stringify(
      {
        processingTime: Date.now() - start,
        timestamp: event.requestContext.time,
      },
      null,
      2
    ),
  }
}

Go example

GO

package main

import (
    "fmt"
    "time"

  "github.com/aws/aws-lambda-go/events"
  "github.com/aws/aws-lambda-go/lambda"

  gremlin "github.com/gremlin/failure-flags-go"
)

func handler(request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
    start := time.Now()

  // Add a failure flag
    gremlin.Invoke(gremlin.FailureFlag{
        Name: `http-ingress`, // The name of the failure flag
        Labels: map[string]string{  // Additional metadata we can use for targeting
              `method`: request.HTTPMethod,
              `path`: request.Path,
        }})

    return events.APIGatewayProxyResponse{
        Body:       fmt.Sprintf(`{"processingTime": %v, "timestamp": "%v"}`, time.Since(start), start),
        StatusCode: 200,
    }, nil
}

func main() {
    lambda.Start(handler)
}

Creating a new Failure Flags experiment

To create a new experiment:

  1. Open the Gremlin web app and select Failure Flags in the left-hand nav menu.
  2. Click Create an Experiment.
  3. Enter an experiment name. This can be anything you wish.
  4. Enter the Failure Flag selector name. This should match the value of the <span class="code-class-custom">name</span> attribute you gave when creating the Failure Flag in your code. For example, the name of the Failure Flag example here is <span class="code-class-custom">http-ingress</span>.
  5. Enter the Failure Flag selector attributes to specify the types of traffic that the experiment will apply to.
  6. Enter the Application selector name. This should match the name of the application that you want to run the experiment on. You can see your list of active applications in the Gremlin web app.
  7. Enter the Application selector attributes to specify the application instances that the experiment will run on. See Selectors for more details.
  8. Enter the experiment Effects in the Effects box. See Effects for more details.
  9. Choose the percentage of applicable failure flags and applications to impact using the Impact Probability boxes. For example, if you choose 1%, then only 1% of the total failure flag instances matched by your selectors will be impacted by the experiment. This does not apply to code executions - the Failure Flag selector name determines that.
  10. Specify how long the experiment will run for using Experiment Duration.
  11. Click Save to save the experiment, or Save & Run to save and immediately execute the experiment.

Selectors

Selectors are JSON objects consisting of key-object pairs. These objects tell Gremlin which applications and Failure Flags to target for an experiment, as well as what effects to apply.

As an example, our HTTPHandler contains the following Node.js code:

JS

const gremlin = require('@gremlin/failure-flags');

module.exports.handler = async (event) => {
  await gremlin.invokeFailureFlag({
    name: 'http-ingress',
    labels: {
      method: event.requestContext.http.method,
      path: event.requestContext.http.path }});
...
};

This means that the the Failure Flag name is <span class="code-class-custom">http-ingress</span>, and the application name is <span class="code-class-custom">HTTPHandler</span>.

Application Attributes

Application attributes let you identify specific instances of an application to run an experiment on. For example, imagine our <span class="code-class-custom">HTTPHandler</span> application runs in AWS Lambda in several different regions. We can use the following application attribute to only impact instances in <span class="code-class-custom">us-west-1</span>:

JSON

{ "region": ["us-west-1"] }

Flag Attributes

Flag attributes are selectors for targeting specific executions of the application's code. Our example <span class="code-class-custom">HTTPHandler</span> application has a method label containing the HTTP request <span class="code-class-custom">method</span>. If we only want to impact <span class="code-class-custom">POST</span> requests, we'd add the following flag attribute:

JSON

{ "method": ["POST"] }

Experiments and Effects

The Effect parameter is where you define the details of the experiment and the impact it will have on your application.The Effect parameter is a simple JSON map that gets passed to the Failure Flags SDK when an application is targeted by a running experiment.

The SDK currently supports two types of effects: latency and error.

Latency

Latency introduces a constant delay into each invocation of the experiment. Specify <span class="code-class-custom">latency</span> for the key, and the number of milliseconds you want to delay as the value. For example, this effect introduces a 2000 millisecond delay:

JSON

{ "latency": 2000 }

Minimum latency with jitter

Alternatively, you can add latency where the amount varies. For example, this effect introduces between 2000 and 2200 milliseconds of latency, where there is a pseudo-random uniform probability of the SDK applying any value within the jitter amount:

JSON

{
  "latency": {
    "ms": 2000,
    "jitter": 200
  }
}

Errors

The Error effect throws an error with the provided message. This is useful for triggering specific error-handling methods or simulating errors you might find in production. For example, this effect triggers an error with the message "Failure Flag error triggered":

JSON

{ "exception": "Failure Flag error triggered" }

If your appliation uses custom error types or other error condition metadata, you can add this metadata to the error effect:

JSON

{
  "exception": {
    "message": "Failure Flag error triggered",
    "name": "CustomErrorType",
    "someAdditionalProperty": "add important metadata here"
  }
}

Combining Latency and Error effects

You can combine the latency and error effect to cause a delay before throwing an exception. This is useful for recreating conditions like network connection failures, degraded connections, or timeouts.

For example, this effect will cause the Failure Flag to pause for 2 full seconds before throwing an exception with a custom message:

JSON

{
  "latency": 2000,
  "exception": "Failure Flag delayed error triggered"
}

Changing application data

Note
This feature is currently only available in the Node.js SDK.

Failure Flags are also capable of modifying application data. This is an advanced effect that requires additional setup using the Failure Flags SDK.

In your application's call to <span class="code-class-custom">invokeFailureFlag</span>, add a new <span class="code-class-custom">dataPrototype</span> property and assign it a variable like a network request or response. You could also pass in an object literal.

JS

let myData = {name: 'HTTPResponse'}; // this is just example data, it could be anything

myData = await failureflags.invokeFailureFlag({
  name: 'flagname',       // the name of your failure flag
  labels: {},             // additional attibutes about this invocation
  dataPrototype: myData); // "myData" is some variable like a request or response. You could also pass in an object literal.

Once the <span class="code-class-custom">dataPrototype</span> property is set, you can add a <span class="code-class-custom">data</span>object to the effect statement. Any properties in the <span class="code-class-custom">data</span> object will be copied into a new object created from the prototype you provided.

JSON

{
  "data": {
    "statusCode": 404,
    "statusMessage": "Not Found"
  }
}

While this experiment is active, <span class="code-class-custom">myData</span> will be changed to the following:

JSON

{
  "name": "HTTPResponse",
  "statusCode": 404,
  "statusMessage": "Not Found"
}

Note
If the experiment is not running, then myData will remain unaltered.

Customizing an experiment's impact

You can customize the impact of the experiment by adding a <span class="code-class-custom">behavior</span> function. For example, the following snippet writes data about the experiment to the console instead of applying the experiment to your code:

Node.js example

JAVASCRIPT

await gremlin.invokeFailureFlag({
  name: 'http-ingress',
  labels: {
    method: event.requestContext.http.method,
    path: event.requestContext.http.path,
  },

  // Log the experiment after it's complete
  behavior: async (experiment) => {
    console.log('handling the experiment', experiment)
  },
})

Go example

GO

gremlin.Invoke(gremlin.FailureFlag{
  Name: `http-ingress`,
  Labels: map[string]string{
    `method`: request.HTTPMethod,
    `path`: request.Path,
  },

  // the following line provides an implementation of the failureflags.Behavior type
  Behavior: func(ff FailureFlag, exps []Experiment) (impacted bool, err error) {
    // write the experiments to standard out
    fmt.Fprintf(os.Stdout, `processing experiments: %v`, exps)
    // continue processing using the default behavior chain
    return failureFlags.DelayedPanicOrError(ff, exps)
  }
})

If you want even more manual control, the SDK can detect whether an experiment is currently active. For example, during an experiment, you might want to prevent making certain API calls, or rollback a transaction. In most cases the Exception effect can help, but you can also create branches in your code. For example:

Node.js example

JAVASCRIPT

if (await failureflags.invokeFailureFlag({ name: 'myFailureFlag' })) {
  // If there is a running experiment then run this branch
} else {
  // If there is no experiment, or it had no impact, then run this branch
}

Go example

GO

if active, impacted, err := FailureFlag{Name: `myFailureFlag`}.Invoke(); active && impacted {
  // If there is a running experiment then run this branch
} else {
  // If there is no experiment, or it had no impact, then run this branch
}

Language-specific features

This section is for features unique to specific SDKs.

Go

Panic

The Go SDK offers a unique fault called <span class="code-class-custom">panic</span>. This causes Failure Flags to panic with the provided message. This is useful when validating that either your application handles Go panics correctly, or when assessing the impact to other parts of the system when your code panics:

GO

{ "panic": "this message will be used in an error provided to panic" }

More information and examples are available on the project's GitHub repo.

No items found.
Previous
Previous
This is some text inside of a div block.
Compatibility
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Integrations
Health Checks
Notifications
Command Line Interface
Updating Gremlin
Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Targets
Experiments
Scenarios
GameDays
Overview
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Examples
Container security
General
Linux
Windows
Chao
Helm
Glossary
Alfi
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Application Level Fault Injection (ALFI)
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Jira
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Overview
Overview
Overview
Overview
Overview
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Role Based Access Control
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Slack
Teams
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Users
Webhooks
Integration Agent for Linux
Test Suites
Restricting Testing Times
Reports
Enabling DNS collection