Dashboard
Application Layer

Application Level Fault Injection (ALFI)

The beta version of Gremlin’s application layer failure injection solution (ALFI) has been closed. At this time the ALFI solution is deprecated and will be replaced with a better alternative once available.

Overview

ALFI.png

Why application-level fault injection is useful

Operators think in requests

Most metrics, dashboards, and alerts that we consume are in terms of requests. RPS, error rate, and latency all implicitly use a request as a unit of work. Requests are not a concept available at the infrastructure-level. At that level, all we see are streams of packets with IP addresses and ports. By moving up to the application-level, we can use all of the request-level metadata in constructing an attack.

Operators Think in Requests

Since requests can include identifiers like customer ID, device ID, country, etc, those facets may be used in constructing an attack. When you have that ability, it is much easier to create a small, well-defined blast radius in your attack. That, in turn, allows for much faster feedback loops and lets you discover latent problems more quickly.

Fault injection without system access

Injecting infrastructure failures requires running a process and accessing other system-level resources. In serverless environments such as AWS Lambda, Google Cloud Functions, and Azure Functions, this access is impossible. In these cases, it is necessary to include the fault-injection mechanism within the application itself. ALFI runs in the JVM as a library, so once you have integrated it into your application, you may use it in any environment.

Examples
  • Simulate an outage in production by creating an attack on your customer ID only. Then you can look for signs of problems when logged in as yourself, while no other users are even aware an attack is occurring.
  • Simulate a problem with a specific endpoint. Partial failure in distributed systems is quite common - some endpoints may be unavailable while others are working perfectly. In order to simulate such a scenario, you can create an attack targeted to some endpoints only and then determine how your system reacts.
  • Always-on failure testing. If you limit an attack to a set of devices you control, then you can run tests against those devices on a regular basis and evaluate how the user experience works when the system is degraded.

Installation

Artifact repository

Gradle
GROOVY

repositories {
    maven {
        url 'https://maven.gremlin.com/'
    }
}

Maven
XML
<repositories>
    <repository>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
        <id>gremlin</id>
        <name>The Gremlin Repository</name>
        <url>https://maven.gremlin.com/</url>
    </repository>
</repositories>

Note
You must add the above repository to your maven or gradle file. Otherwise, you will encounter an error message similar to Could not find artifact com.gremlin:[client]:pom:[version] in central (https://repo.maven.apache.org/maven2)

alfi-core

Gradle
GROOVY

implementation group: 'com.gremlin', name: 'alfi-core', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-core</artifactId>
    <version>LATEST</version>
</dependency>

alfi-aws

Gradle
GROOVY
// If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
// (like Parameter Store Configuration support)
implementation group: 'com.gremlin', name: 'alfi-aws', version: '0.5+'

Maven
XML
<!-- If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
    (like Parameter Store Configuration support) -->>
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-aws</artifactId>
    <version>LATEST</version>
</dependency>

alfi-apache-http-client

Gradle
GROOVY

// Apache HTTP Client Injection Points
implementation group: 'com.gremlin', name: 'alfi-apache-http-client', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-http-servlet-filter</artifactId>
    <version>LATEST</version>
</dependency>

alfi-http-servlet-filter

Gradle
GROOVY
implementation group: 'com.gremlin', name: 'alfi-http-servlet-filter', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-http-servlet-filter</artifactId>
    <version>LATEST</version>
</dependency>

alfi-aws-dynamodb-client

Gradle
GROOVY

// DynamoDB Injection Points
implementation group: 'com.gremlin', name: 'alfi-aws-dynamodb-client', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-aws-dynamodb-client</artifactId>
    <version>LATEST</version>
</dependency>

Artifact repository

Gradle
GROVVY

repositories {
    maven {
        url 'https://maven.gremlin.com/'
    }
}

Maven
XML

<repositories>
    <repository>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
        <id>gremlin</id>
        <name>The Gremlin Repository</name>
        <url>https://maven.gremlin.com/</url>
    </repository>
</repositories>

Note
You must add the above repository to your maven or gradle file. Otherwise, you will encounter an error message similar to Could not find artifact com.gremlin:[client]:pom:[version] in central (https://repo.maven.apache.org/maven2)

alfi-core

Gradle
GROOVY

implementation group: 'com.gremlin', name: 'alfi-core', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-core</artifactId>
    LATEST
</dependency>

alfi-aws

Gradle
BASH

// If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
// (like Parameter Store Configuration support)
implementation group: 'com.gremlin', name: 'alfi-aws', version: '0.5+'

Maven
XML
<!-- If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
    (like Parameter Store Configuration support) -->
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-aws</artifactId>
    <version>LATEST</version>
</dependency>

alfi-apache-http-client

Gradle
GROOVY

// Apache HTTP Client Injection Points
implementation group: 'com.gremlin', name: 'alfi-apache-http-client', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-apache-http-client</artifactId>
    <version>LATEST</version>
</dependency>

alfi-http-servlet-filter

Gradle
GROOVY
implementation group: 'com.gremlin', name: 'alfi-http-servlet-filter', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-apache-http-client</artifactId>
    <version>LATEST</version>
</dependency>

alfi-aws-dynamodb-client

Gradle
GROOVY
// DynamoDB Injection Points
implementation group: 'com.gremlin', name: 'alfi-aws-dynamodb-client', version: '0.5+'

Maven
XML
<dependency>
    <groupId>com.gremlin</groupId>
    <artifactId>alfi-aws-dynamodb-client</artifactId>
    <version>LATEST</version>
</dependency>

Authentication & configuration

Authenticate your application with Gremlin

In order to authenticate to Gremlin, you must provide the following configuration values to your application.

  • <span class="code-class-custom">GREMLIN_ALFI_IDENTIFIER </span>: A unique identifier for the application. This will be used to distinguish all of the application instances from one another
  • <span class="code-class-custom">GREMLIN_TEAM_ID </span>: The Team ID that this application belongs to. Only users in that team may conduct attacks on it.
  • <span class="code-class-custom">GREMLIN_TEAM_CERTIFICATE_OR_FILE</span> : Certificate for authenticating to Gremlin. See below for syntax on permissible values.
  • <span class="code-class-custom">GREMLIN_TEAM_PRIVATE_KEY_OR_FILE</span> : Private key for authenticating to Gremlin. See below for syntax on permissible values.

You may set these as environment variables or in a <span class="code-class-custom">gremlin.properties</span> file on the classpath. Certificates can be downloaded for each team from the Settings Page.

Examples

As a raw value

BASH

GREMLIN_TEAM_CERTIFICATE_OR_FILE=-----BEGIN CERTIFICATE-----...

Or pointing to a file

BASH

GREMLIN_TEAM_CERTIFICATE_OR_FILE=file:///usr/gremlin/certificate.pem

Optional configuration

The following keys may be set to tune how ALFI operates.

  • <span class="code-class-custom">GREMLIN_ALFI_ENABLED </span>: If set to anything other than <span class="code-class-custom">true</span>, all functionality is turned off. This is designed to give you the ability to safely deploy ALFI, knowing you've got a simple off-switch. When the functionality is off, no failures are ever injected by ALFI, no calls are made to the API, and no logging past configuration-time occurs.
  • <span class="code-class-custom">GREMLIN_REFRESH_INTERVAL_MS</span> : You may optionally provide this value to set the frequency with which the library will contact the Gremlin API. Minimum of 1000 (1 second), maximum of 300000 (5 minutes). Default of 10000 (10 seconds). This determines how quickly your application reacts to attacks being halted or created and the amount of network traffic generated by the library.
  • <span class="code-class-custom">http_proxy</span> : You may specify a proxy for traffic from the ALFI library back to the Gremlin control plane. This may optionally include basic auth.
Examples
  • <span class="code-class-custom">GREMLIN_ALFI_ENABLED=true</span>
  • <span class="code-class-custom">GREMLIN_ALFI_IDENTIFIER=recommendation-service-i-0ab123456</span>
  • <span class="code-class-custom">GREMLIN_REFRESH_INTERVAL_MS=20000</span>
  • <span class="code-class-custom">http_proxy=http://proxy.server:3128</span>
  • <span class="code-class-custom">http_proxy=http://username:password@proxy.server:3128</span>

Alternate configuration mechanism

As described above, the default configuration resolution mechanism is to use either properties defined in <span class="code-class-custom">gremlin.properties</span>, or in environment variables where your application runs. If those don't fit your needs, then you can provide an alternate mechanism by subclassing GremlinConfigurationResolver (javadocs) and supplying it to GremlinServiceFactory (javadocs) at construction-time.

Setup

Step by step

In a hurry? Skip to Complete examples.

  1. Construct an ApplicationCoordinates instance.
  2. Construct a TrafficCoordinates instance.
  3. Optionally (if using a custom TrafficCoordinates instance) construct a GremlinService singleton.
  4. Optionally (if using a custom TrafficCoordinates instance) inject the fault using <span class="code-class-custom">com.gremlin.GremlinService#applyImpact(trafficCoordinates)</span>. Add this line of code anywhere in your application, you wish the fault to be injected.
  5. Click here to create a new Attack.
  6. Select an <span class="code-class-custom">Application Query</span>.
  7. Set the necessary fields for the selected <span class="code-class-custom">Application Query</span>.
  8. These are defined when setting up the<span class="code-class-custom"> ApplicationCoordinates </span>.
  9. Select a Traffic Query.
  10. These are defined when setting up the <span class="code-class-custom">TrafficCoordinates</span>.
  11. Choose a Gremlin attack - Set the amount of latency in ms to apply and optionally throw a<span class="code-class-custom"> RuntimeException</span> within your application.
  12. Run the attack - Set the duration in seconds for how long the attack will last.
  13. Test your application to observe the impact of the attack.

Complete examples

ALFI AWS
JAVA

package com.alfilambda;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.LambdaLogger;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.gremlin.*;
import com.gremlin.aws.AwsApplicationCoordinatesResolver;

import java.time.Duration;
import java.time.Instant;
import java.util.Map;

public class AlfiDemoHandler implements RequestHandler, String> {

    private final GremlinService gremlinService;

    public AlfiDemoHandler() {
        final GremlinServiceFactory factory = new GremlinServiceFactory(new GremlinCoordinatesProvider() {
            @Override
            public ApplicationCoordinates initializeApplicationCoordinates() {
                ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
                        .orElseThrow(IllegalStateException::new);
                return coords;
            }
        });
        gremlinService = factory.getGremlinService();
    }

    @Override
    public String handleRequest(Map input, Context context) {
        Instant start = Instant.now();
        TrafficCoordinates trafficCoordinates = new TrafficCoordinates.Builder()
                .withType(this.getClass().getSimpleName())
                .withField("method", "handleRequest")
                .build();
        gremlinService.applyImpact(trafficCoordinates);
        LambdaLogger logger = context.getLogger();
        Instant finish = Instant.now();
        long timeElapsed = Duration.between(start, finish).toMillis();  //in millis
        logger.log(String.format("Lambda took %s millis", timeElapsed));
        return new String("200 OK");
    }
}

  1. Click here to create a new Attack
  2. See Attack and Lambda setup here: https://github.com/gremlin/alfi-lambda/blob/master/README.md
ALFI DynamoDB
JAVA

package com.example.alfidynamodb.config;

import com.amazonaws.ClientConfiguration;
import com.amazonaws.handlers.RequestHandler2;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.gremlin.*;
import com.gremlin.aws.GremlinDynamoRequestInterceptor;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class AlfiConfig {

    private static final String APPLICATION_QUERY_NAME = "ALFIDemoApplication";
    private static final int CLIENT_EXECUTION_TIMEOUT = 1500;
    private static final int CLIENT_REQUEST_TIMEOUT = 500;

    @Value("${aws.region}")
    private String region;


    public GremlinCoordinatesProvider gremlinCoordinatesProvider() {
        return new GremlinCoordinatesProvider() {
            @Override
            public ApplicationCoordinates initializeApplicationCoordinates() {
                return new ApplicationCoordinates.Builder()
                        .withType(APPLICATION_QUERY_NAME)
                        .build();
            }
        };
    }

    public GremlinServiceFactory gremlinServiceFactory() {
        return new GremlinServiceFactory(gremlinCoordinatesProvider());
    }

    public GremlinService gremlinService() {
        return gremlinServiceFactory().getGremlinService();
    }

    @Bean
    public AmazonDynamoDB amazonDynamoDB() {
        final RequestHandler2 gremlinDynamoInterceptor = new GremlinDynamoRequestInterceptor(gremlinService(), CLIENT_EXECUTION_TIMEOUT, CLIENT_REQUEST_TIMEOUT);
        return AmazonDynamoDBClientBuilder.standard()
                .withRegion(region)
                .withClientConfiguration(new ClientConfiguration()
                        .withClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT)
                        .withConnectionTimeout(CLIENT_REQUEST_TIMEOUT)
                        .withMaxErrorRetry(2)
                )
                .withRequestHandlers(gremlinDynamoInterceptor).build();
    }

}

JAVA

package com.example.alfidynamodb.persistence;

import com.amazonaws.AmazonServiceException;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.amazonaws.services.dynamodbv2.model.GetItemRequest;
import com.amazonaws.services.kms.model.NotFoundException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import java.util.HashMap;
import java.util.Map;
@Component
public class GetItemRequester {

    private final Logger LOG = LoggerFactory.getLogger(getClass().getName());

    @Value("${dynamo.db.table}")
    private String table;

    private final AmazonDynamoDB amazonDynamoDB;

    public GetItemRequester(@Autowired AmazonDynamoDB amazonDynamoDB) {
        this.amazonDynamoDB = amazonDynamoDB;
    }

    public Map getItem(String id) {
        long startTime = System.currentTimeMillis();
        try {

            LOG.info(String.format("Querying DynamoDB for item with ID %s...", id));
            Map returnedItem = amazonDynamoDB.getItem(createRequestWithId(id)).getItem();
            if (returnedItem != null) {
                return returnedItem;
            } else {
                throw new NotFoundException(String.format("Item with id %s not found!", id));
            }
        } catch (AmazonServiceException e) {
            LOG.error(e.getMessage());
            throw e;
        } finally {
            long endTime = System.currentTimeMillis();
            long duration = (endTime - startTime);
            LOG.info(String.format("Call to DynamoDB took %s milliseconds.", duration));
        }
    }

    private GetItemRequest createRequestWithId(String id) {
        HashMap keyToGet = new HashMap<>();
        keyToGet.put("id", new AttributeValue(id));
        return new GetItemRequest().withKey(keyToGet).withTableName(table);
    }
}

  1. Click here to create a new Attack
  2. Fill out the Application Query and Traffic Query fields to match this example:
alfi-dynamodb-gremlin-1.pn
ALFI HTTP Servlet Filter
JAVA

package com.example.rec;

import com.gremlin.ApplicationCoordinates;
import com.gremlin.GremlinCoordinatesProvider;
import com.gremlin.GremlinService;
import com.gremlin.GremlinServiceFactory;
import com.gremlin.http.servlet.GremlinServletFilter;
import org.springframework.boot.web.servlet.FilterRegistrationBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class WebConfig {

    @Bean
    public FilterRegistrationBean recommendationsFilterRegistrationBean() {
        FilterRegistrationBean registrationBean = new FilterRegistrationBean();
        registrationBean.setName("recs");

        final GremlinCoordinatesProvider alfiCoordinatesProvider = new GremlinCoordinatesProvider() {
            @Override
            public ApplicationCoordinates initializeApplicationCoordinates() {
                return new ApplicationCoordinates.Builder()
                        .withType("local")
                        .withField("service", "recommendations")
                        .build();
            }
        };
        final GremlinServiceFactory alfiFactory = new GremlinServiceFactory(alfiCoordinatesProvider);
        final GremlinService alfi = alfiFactory.getGremlinService();

        GremlinServletFilter alfiFilter = new GremlinServletFilter(alfi);
        registrationBean.setFilter(alfiFilter);
        registrationBean.setOrder(1);
        return registrationBean;
    }

}

There is no need to define a TrafficCoordinates when using the GremlinServletFilter. This library takes care of that for you. This enables you to target any verb and any route hosted by your application! For example, you could narrow the blast radius of an attack to only GET requests to https://somehost/recommendations.

  1. Click here to create a new Attack
  2. Fill out the Application Query and Traffic Query fields to match this example:
alfi-core-gremlin-ui-4
ALFI Apache Http Client
JAVA

package com.example.alfiapachehttpclient.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import com.gremlin.*;

@Configuration
public class ALFIConfig {

    private static final String APPLICATION_QUERY_NAME = "ALFIApacheHttpClientDemo";

    public GremlinCoordinatesProvider gremlinCoordinatesProvider() {
        return new GremlinCoordinatesProvider() {
            @Override
            public ApplicationCoordinates initializeApplicationCoordinates() {
                return new ApplicationCoordinates.Builder()
                        .withType(APPLICATION_QUERY_NAME)
                        .build();
            }
        };
    }

    public GremlinServiceFactory gremlinServiceFactory() {
        return new GremlinServiceFactory(gremlinCoordinatesProvider());
    }

    @Bean
    public GremlinService gremlinService() {
        return gremlinServiceFactory().getGremlinService();
    }

}

JAVA

package com.example.alfiapachehttpclient.config;

import com.gremlin.GremlinService;
import com.gremlin.http.client.GremlinApacheHttpRequestInterceptor;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ApacheClientConfig {

    private final GremlinService gremlinService;
    private static final int CONNECTION_TIMEOUT = 1000;
    private static final int SOCKET_TIMEOUT = 3000;

    @Autowired
    public ApacheClientConfig(GremlinService gremlinService) {
        this.gremlinService = gremlinService;
    }

    @Bean
    public CloseableHttpClient closableHttpClient() {
        RequestConfig requestConfig = RequestConfig.custom()
                .setConnectTimeout(CONNECTION_TIMEOUT)
                .setSocketTimeout(SOCKET_TIMEOUT)
                .build();

        final GremlinApacheHttpRequestInterceptor gremlinInterceptor =
                new GremlinApacheHttpRequestInterceptor(gremlinService, "alfi-client-demo");
        final HttpClientBuilder clientBuilder = HttpClientBuilder
                .create()
                .addInterceptorFirst(gremlinInterceptor)
                .setDefaultRequestConfig(requestConfig);

        return clientBuilder.build();
    }


}

JAVA

package com.example.alfiapachehttpclient.controller;

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.util.EntityUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;

@RestController
public class MainController {

    private final Logger LOG = LoggerFactory.getLogger(getClass().getName());

    private final CloseableHttpClient closeableHttpClient;
    private CloseableHttpResponse closeableHttpResponse;

    @Autowired
    public MainController(CloseableHttpClient closeableHttpClient) {
        this.closeableHttpClient = closeableHttpClient;
    }

    @GetMapping("/")
    public @ResponseBody
    ResponseEntity hello() {
        final String URI = "https://www.gremlin.com/";
        HttpGet httpGet = new HttpGet(URI);
        String responseContent = null;
        long startTime = System.currentTimeMillis();
        try {
            LOG.info(String.format("Executing GET request to %s...", URI));
            closeableHttpResponse = closeableHttpClient.execute(httpGet);
            HttpEntity httpEntity = closeableHttpResponse.getEntity();
            responseContent = EntityUtils.toString(httpEntity);
            EntityUtils.consume(httpEntity);
            LOG.info(responseContent);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            long endTime = System.currentTimeMillis();
            long duration = (endTime - startTime);
            LOG.info(String.format("GET Request took %d milliseconds", duration));
            try {
                closeableHttpResponse.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return new ResponseEntity<>(responseContent, HttpStatus.OK);
    }
}

  1. Click here to create a new Attack
  2. Fill out the Application Query and Traffic Query fields to match the following:
alfi-apache-http.png
ALFI Core
JAVA

package com.gremlin.todo.config;

import com.gremlin.*;
import com.gremlin.todo.ToDoApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Service;

import javax.annotation.PostConstruct;

@Configuration
public class ALFIConfig {

    public GremlinCoordinatesProvider gremlinCoordinatesProvider() {
        return new GremlinCoordinatesProvider() {
            @Override
            public ApplicationCoordinates initializeApplicationCoordinates() {
                return new ApplicationCoordinates.Builder()
                        .withType("MyApplication")
                        .withField("service", "to-do")
                        .build();
            }
        };
    }

    public GremlinServiceFactory gremlinServiceFactory() {
        return new GremlinServiceFactory(gremlinCoordinatesProvider());
    }

    @Bean
    public GremlinService gremlinService() {
        return gremlinServiceFactory().getGremlinService();
    }

}

JAVA

package com.gremlin.todo.controller;

import com.gremlin.todo.aspect.AdvancedAttack;
import com.gremlin.todo.aspect.Attack;
import com.gremlin.todo.dto.ToDoDto;
import com.gremlin.todo.model.ToDo;
import com.gremlin.todo.service.ToDoService;
import org.bson.types.ObjectId;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

import java.util.Collection;

@RestController
public class MyController {
    private final GremlinService gremlinService;
    private TrafficCoordinates getAllToDosCoordinates;

    @Autowired
    public MyController(GremlinService gremlinService) {
        this.gremlinService = gremlinService;
    }

    @GetMapping("/all")
    public Collection getAllToDos() {
        gremlinService.applyImpact(this.getAllToDosCoordinates);
        return toDoService.findAll();
    }

    @PostConstruct() {
        getAllToDosCoordinates = new TrafficCoordinates
                                     .Builder()
                                     .withType("MyController")
                                     .withField("method", "getAllToDos")
                                     .build();
    }

}

  1. Click here to create a new Attack
  2. Fill out the Application Query and Traffic Query fields to match the following:
alfi-core-gremlin-ui-1
alfi-core-gremlin-ui-2
The custom value for the traffic type is hidden behind ellipses in that screenshot. The value is getAllToDos.

Attacks

Integrate the library

To use ALFI, you must first integrate the Gremlin libraries into your application and redeploy. Please see the JVM Installation Guide for more details. Once you have successfully integrated the library, you should see logging like this:

INFO com.gremlin.GremlinServiceFactory - Gremlin enabled for Team abcdefgh-1234-9876-3333-nopqrstuvwxy

Create attacks via the Web UI

Now you can start creating attacks from the Web UI. Here you will see a history of ALFI attacks run by your team.

Once you click <span class="code-class-custom">New ALFI Attack</span>, you will receive a form with <span class="code-class-custom">Application Type</span>, <span class="code-class-custom">Traffic Type</span>, and <span class="code-class-custom">Impact</span> sections.

Application Type

This section provides a way to determine which applications are eligible for the ALFI attack.

Upon application startup, the ALFI code running in each application creates an instance of <span class="code-class-custom">ApplicationCoordinates</span> and passes that to the Gremlin API. Each <span class="code-class-custom">ApplicationCoordinates</span> instance is eligible to pick up an ALFI attack. Please see Application Coordinates Setup for details on how to populate <span class="code-class-custom">ApplicationCoordinates</span>.

The ALFI library comes with two Application Types out of the box: AWS Lambda and AWS EC2. Custom Application Types can also be created from your application, which can then be used in the Web UI with the <span class="code-class-custom">Add Custom Field</span> button. Keep in mind that the most effective chaos experiments start small, so keep your custom Application Types as specific as possible.

Traffic Type

This section provides a way to select individual requests within your application and only impact that set.

Any attribute which you have supplied in a <span class="code-class-custom">TrafficCoordinates</span> is eligible to use in constructing the attack. Please see Traffic Coordinates Setup and Attaching Request Context data to all TrafficCoordinates for details on how to control the data being placed into a <span class="code-class-custom">TrafficCoordinates</span> instance.

The ALFI library includes integrations for the Apache HTTP client and Dynamo DB client (with more to come!), however you are free to create any sort of Traffic Type you would like and use those custom fields as attributes of the attack.

For Traffic Type, you may also supply a <span class="code-class-custom">Percentage of Traffic</span> value. As probability is used to target this percentage, the actual impact may not exactly reflect the value specified.

Impact

This section provides a way to declare what impact you would like to inject.

You may choose an amount of latency to inject as well as a yes/no switch on whether you want this call to fail. These can also be combined to simulate a slow call which eventually fails. This impact gets applied to all traffic which matches the Traffic Type you've described above on the Application Type you've described above.

In this section, you also are required to declare the duration of the attack. For this duration, the attack is active and ALFI-enabled applications are impacted. As soon as the duration elapses, the applications no longer know about the attack and are no longer impacted.

Observe attack results

Once you press the <span class="code-class-custom">Unleash Gremlin</span> button, the attack becomes active and applications will start picking it up. Here you can see all of the attributes used in scoping the attack, as well as what the impact is and the duration of the attack. The attack then starts progressing through different phases of its lifecycle, as described here:

StageDescription
PendingCreated but no applications have picked up the attack
DistributedAt least one application has picked up the attack, but none have been impacted
ImpactedAt least one application has picked up the attack and been impacted
SuccessfulImpact was applied and duration elapsed
ApplicationNotFoundNo application ever picked up the attack and duration elapsed
TrafficNotFoundNo application ever applied impact and duration elapsed
HaltedAttack was halted (by UI or API) prior to the duration elapsing

Libraries

Java Client library

  • alfi-core: Core library required for all ALFI functionality
  • alfi-aws: Optional AWS integration, providing coordinate discover for <span class="code-class-custom">AwsLambda</span> and <span class="code-class-custom">AwsEc2</span>
  • alfi-apache-http-client: ALFI injection points for Apache HTTP Client
  • alfi-aws-dynamodb-client: ALFI injection points for DynamoDB

In ALFI, each application has a set of identifying attributes. This set of attributes is named <span class="code-class-custom">ApplicationCoordinates</span> and is used to determine when an application matches an attack.

ApplicationCoordinates

AWS Lambda Function

  • Dependency: alfi-aws
  • <span class="code-class-custom">.inferFromEnvironment()</span> will extract the region and name of your Lambda function from your environment and use it as the <span class="code-class-custom">Region</span> and <span class="code-class-custom">Name</span> fields respectively the in the Gremlin UI.
JAVA

ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
                        .orElseThrow(IllegalStateException::new);

AWS Lambda Function

AWS EC2 Application

  • Dependency: alfi-aws
  • <span class="code-class-custom">.inferFromEnvironment()</span> will extract the region, availability zone and instance ID from your environment and use it as the <span class="code-class-custom">Region</span>, <span class="code-class-custom">Availability Zone</span> and <span class="code-class-custom">Instance ID</span> fields respectively the in the Gremlin UI.
JAVA

ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
                        .orElseThrow(IllegalStateException::new);

AWS EC2 Application

Custom Application Type

Let's imagine you have an application called TheShop which contains a UserService and a PaymentService. In this case, to uniquely identify each of these services in the Gremlin control plane, you would construct two <span class="code-class-custom">ApplicationCoordinate</span> s, each with the same value set for the <span class="code-class-custom">withType(...)</span> field and a unique value set for the <span class="code-class-custom">.withField(...)</span>.

JAVA

ApplicationCoordinates coords = ApplicationCoordinates.Builder()
                        .withType("TheShop")
                        .withField("service", "UserService")
                        .build();

JAVA

ApplicationCoordinates coords = ApplicationCoordinates.Builder()
                        .withType("TheShop")
                        .withField("service", "PaymentService")
                        .build();

Take notice of the <span class="code-class-custom">withType(...)</span> and <span class="code-class-custom">withField(...)</span> methods. The value defined in the <span class="code-class-custom">withType(...)</span> method will need to be defined in the <span class="code-class-custom">Name</span> field of the Gremlin UI (see images below). The value defined in the <span class="code-class-custom">withField(...)</span> method will need to be defined in the <span class="code-class-custom">Custom Value</span><span class="code-class-custom"> field of the Gremlin UI (see images below).</span>

To target both services, configure the UI like this:

Custom Application Type
To target one of the services, configure the UI like this:

Custom Application Type Single Service
Don't forget to click on the + icon

TrafficCoordinates

<span class="code-class-custom">com.gremlin.TrafficCoordinates</span> instances are used to control the blast radius of an ALFI experiment. The blast radius for ALFI could be all or a subset of HTTP verbs, all or a subset of your application's HTTP request paths, or even a specific block of code within your application.

Outbound HTTP Traffic

The <span class="code-class-custom">com.gremlin.TrafficCoordinates</span> instance for Outbound HTTP Traffic will be automatically generated by the <span class="code-class-custom">com.gremlin.http.client.GremlinApacheHttpRequestInterceptor</span> which comes with the alfi-apache-http-client library. This interceptor will give you the ability to impact any HTTP verb or request route within your application. To take advantage of the <span class="code-class-custom">com.gremlin.http.client.GremlinApacheHttpRequestInterceptor</span>, you will need to add an instance of it to <span class="code-class-custom">org.apache.http.impl.client.HttpClientBuilder</span> when you create your <span class="code-class-custom">org.apache.http.client.HttpClient</span> client.

JAVA

final GremlinApacheHttpRequestInterceptor gremlinInterceptor = new GremlinApacheHttpRequestInterceptor(gremlinService, "alfi-client-demo");
final HttpClientBuilder clientBuilder = HttpClientBuilder.create().addInterceptorFirst(gremlinInterceptor);

Outbound HTTP Traffic
The configuration in the screenshot above, targets 50% of all HTTP GET traffic to the application. The second argument to com.gremlin.http.client.GremlinApacheHttpRequestInterceptor is a string and must match the value defined in the Client Name (required) input field of the Gremlin UI..

Inbound HTTP Traffic

<span class="code-class-custom">com.gremlin.TrafficCoordinates</span> instances are automatically created for you if alfi-http-servlet-filter is on the classpath.

Inbound HTTP Traffic
The configuration in the screenshot above, targets 50% of all HTTP POST requests to the /payments route

Dynamo DB Traffic

The <span class="code-class-custom">com.gremlin.TrafficCoordinates</span> instance for Dynamo DB Traffic will be automatically generated by the <span class="code-class-custom">com.gremlin.aws.GremlinDynamoRequestInterceptor</span> which comes with the alfi-aws library. This interceptor will give you the ability to impact any DynamoDB operation (<span class="code-class-custom">Get Item</span>, <span class="code-class-custom">Delete Item</span>, etc...). To take advantage of the <span class="code-class-custom">com.gremlin.aws.GremlinDynamoRequestInterceptor</span>, you will need to add an instance of it to <span class="code-class-custom">com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder</span> when you create your <span class="code-class-custom">com.amazonaws.services.dynamodbv2.AmazonDynamoDB</span> client.

JAVA

final RequestHandler2 gremlinDynamoInterceptor = new GremlinDynamoRequestInterceptor(gremlinService(), CLIENT_EXECUTION_TIMEOUT, CLIENT_REQUEST_TIMEOUT);
final AmazonDynamoDB dbClient = AmazonDynamoDBClientBuilder
    .standard()
    .withRegion(region)
    .withClientConfiguration(new ClientConfiguration()
        .withClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT)
        .withConnectionTimeout(CLIENT_REQUEST_TIMEOUT)
        .withMaxErrorRetry(2)
    ).withRequestHandlers(gremlinDynamoInterceptor)
    .build();

Dynamo DB Traffic
The configuration in the screenshot above, targets 50% of all Get Item traffic to the application.

Custom Traffic Type

JAVA

final TrafficCoordinates trafficCoordinates = new TrafficCoordinates.Builder()
                .withType("PaymentController")
                .withField("method", "submitPayment")
                .build();

public HttpEntity submitPayment(Payment paymentRequest) {
    this.gremlinService.applyImpact(trafficCoordinates); // Fault injected!
    return paymentService.makePayment(paymentRequest);
}

Custom Traffic Type
The configuration in the screenshot above, targets 50% of all calls to the PaymentController#submitPayment(PaymentRequest paymentRequest) method.

Extend TrafficCoordinates

Often, companies set up their infrastructure to maintain a per-request data structure and use this information to provide logging, monitoring, and observability data points. A common pattern is to set up a <span class="code-class-custom">RequestContext</span> and have authentication filters put in information like <span class="code-class-custom">customerId</span> or <span class="code-class-custom">deviceId</span> into the <span class="code-class-custom">RequestContext</span> object. This object then permits access from any later point, so that those attributes are easily available. These are often excellent locations on which to create attacks. If your system operates in this way, then you can set up a mapping to populate these values on all <span class="code-class-custom">TrafficCoordinates</span>. This code lives in a concrete subclass of <span class="code-class-custom">GremlinCoordinatesProvider</span>, which you've already seen in: Initialize Application Coordinates.

JAVA

import com.gremlin.GremlinCoordinatesProvider;
import com.gremlin.TrafficCoordinates;

public class MyCoordinatesProvider extends GremlinCoordinatesProvider {

    @Override
    public TrafficCoordinates extendEachTrafficCoordinates(TrafficCoordinates incomingCoordinates) {
        incomingCoordinates.putField("customerId", MyRequestContext.getCustomerId());
        incomingCoordinates.putField("deviceId", MyRequestContext.getDeviceId());
        incomingCoordinates.putField("country", MyRequestContext.getCountry());
        return incomingCoordinates;
    }
}

With this code wired into the construction of your <span class="code-class-custom">GremlinService</span> instance, all <span class="code-class-custom">TrafficCoordinates</span> will now get those 3 attributes and they are eligible to be matched for any type of traffic you'd like to attack.

GremlinService

To create a <span class="code-class-custom">com.gremlin.GremlinService</span>, you need a <span class="code-class-custom">com.gremlin.GremlinCoordinatesProvider</span>, which needs a com.gremlin.ApplicationCoordinates.

To construct a GremlinService using the alfi-aws library:

JAVA

final GremlinServiceFactory factory = new GremlinServiceFactory(new GremlinCoordinatesProvider() {
            @Override
            public ApplicationCoordinates initializeApplicationCoordinates() {
                ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
                        .orElseThrow(IllegalStateException::new);
                return coords;
            }
        });
final GremlinService gremlinService = factory.getGremlinService();

Design
com.gremlin.GremlinService should be a singleton.

Injecting fault

Once you have a reference to the <span class="code-class-custom">com.gremlin.GremlinService</span> singleton and have defined your Custom com.gremlin.TrafficCoordinates, you can inject fault like this:

JAVA

gremlinService.applyImpact(trafficCoordinates);
0.7.4
July 7, 2020
  • Fix: If the gremlin.properties file was on the classpath, Gremlin was not properly using it when resolving configuration.
0.7.3
December 23, 2019
  • Fix: Change the payload of the authorization header sent to Gremlin API to resolve HTTP 401s from a server-side change that does extra certificate validation
  • New: Added support for HTTP proxy. Set http_proxy environment variable, and ALFI traffic to Gremlin API will use the specified proxy URL.
0.7.2
April 24, 2019
  • Fix: Allow certificate parsing to work properly on Windows
  • Info: Updated dependencies
0.7.1
April 11, 2019
  • Fix: Much friendlier error messages when installation/setup is unsuccessful
0.7.0
February 9, 2019
  • New: Addition of Inbound HTTP injections points, both for javax.servlet Filters and JAX-RS Filters
0.6.1
February 21, 2019
  • Info: Updated dependencies
0.6.0
February 12, 2019
  • Fix: Allow chaining of property sources, so that a failure to lookup in Parameter Store still allows a lookup from environment variables
0.5.3
January 22, 2019
  • Info: Release process changes only
0.5.2
January 10, 2019
  • Info: Change artifact location to maven.gremlin.com
0.5.1
October 23, 2018
  • Info: The GREMLIN_ALFI_IDENTIFIER is required (previously was optional) when authenticating your application with Gremlin
0.5.0
October 11, 2018
  • New: Install with Maven now avialable
  • New: Client library modules available individually
  • New: AWS Parameter Store can be used for configuration
Previous
This is some text inside of a div block.
Compatibility
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Integrations
Health Checks
Notifications
Command Line Interface
Updating Gremlin
Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Targets
Experiments
Scenarios
GameDays
Overview
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Examples
Container security
General
Linux
Windows
Chao
Helm
Glossary
Alfi
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Application Level Fault Injection (ALFI)
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Jira
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Overview
Overview
Overview
Overview
Overview
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Role Based Access Control
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Slack
Teams
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Users
Webhooks
Integration Agent for Linux
Test Suites
Restricting Testing Times
Reports
Enabling DNS collection