Search documentation
Dashboard
Application Layer

Application Level Fault Injection (ALFI)

Overview

ALFI.png

Why application-level fault injection is useful

Operators think in requests

Most metrics, dashboards, and alerts that we consume are in terms of requests. RPS, error rate, and latency all implicitly use a request as a unit of work. Requests are not a concept available at the infrastructure-level. At that level, all we see are streams of packets with IP addresses and ports. By moving up to the application-level, we can use all of the request-level metadata in constructing an attack.

Operators Think in Requests

Since requests can include identifiers like customer ID, device ID, country, etc, those facets may be used in constructing an attack. When you have that ability, it is much easier to create a small, well-defined blast radius in your attack. That, in turn, allows for much faster feedback loops and lets you discover latent problems more quickly.

Fault injection without system access

Injecting infrastructure failures requires running a process and accessing other system-level resources. In serverless environments such as AWS Lambda, Google Cloud Functions, and Azure Functions, this access is impossible. In these cases, it is necessary to include the fault-injection mechanism within the application itself. ALFI runs in the JVM as a library, so once you have integrated it into your application, you may use it in any environment.

Examples
  • Simulate an outage in production by creating an attack on your customer ID only. Then you can look for signs of problems when logged in as yourself, while no other users are even aware an attack is occurring.
  • Simulate a problem with a specific endpoint. Partial failure in distributed systems is quite common - some endpoints may be unavailable while others are working perfectly. In order to simulate such a scenario, you can create an attack targeted to some endpoints only and then determine how your system reacts.
  • Always-on failure testing. If you limit an attack to a set of devices you control, then you can run tests against those devices on a regular basis and evaluate how the user experience works when the system is degraded.

Installation

Artifact repository

Gradle
groovy
1repositories {
2 maven {
3 url 'https://maven.gremlin.com/'
4 }
5}
Maven
xml
1<repositories>
2 <repository>
3 <snapshots>
4 <enabled>false</enabled>
5 </snapshots>
6 <id>gremlin</id>
7 <name>The Gremlin Repository</name>
8 <url>https://maven.gremlin.com/</url>
9 </repository>
10</repositories>

alfi-core

Gradle
groovy
1implementation group: 'com.gremlin', name: 'alfi-core', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-core</artifactId>
4 <version>LATEST</version>
5</dependency>

alfi-aws

Gradle
groovy
1// If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
2// (like Parameter Store Configuration support)
3implementation group: 'com.gremlin', name: 'alfi-aws', version: '0.5+'
Maven
xml
1<!-- If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
2 (like Parameter Store Configuration support) -->
3<dependency>
4 <groupId>com.gremlin</groupId>
5 <artifactId>alfi-aws</artifactId>
6 <version>LATEST</version>
7</dependency>

alfi-apache-http-client

Gradle
groovy
1// Apache HTTP Client Injection Points
2implementation group: 'com.gremlin', name: 'alfi-apache-http-client', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-apache-http-client</artifactId>
4 <version>LATEST</version>
5</dependency>

alfi-http-servlet-filter

Gradle
groovy
1implementation group: 'com.gremlin', name: 'alfi-http-servlet-filter', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-http-servlet-filter</artifactId>
4 <version>LATEST</version>
5</dependency>

alfi-aws-dynamodb-client

Gradle
groovy
1// DynamoDB Injection Points
2implementation group: 'com.gremlin', name: 'alfi-aws-dynamodb-client', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-aws-dynamodb-client</artifactId>
4 <version>LATEST</version>
5</dependency>

Artifact repository

Gradle
groovy
1repositories {
2 maven {
3 url 'https://maven.gremlin.com/'
4 }
5}
Maven
xml
1<repositories>
2 <repository>
3 <snapshots>
4 <enabled>false</enabled>
5 </snapshots>
6 <id>gremlin</id>
7 <name>The Gremlin Repository</name>
8 <url>https://maven.gremlin.com/</url>
9 </repository>
10</repositories>

alfi-core

Gradle
groovy
1implementation group: 'com.gremlin', name: 'alfi-core', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-core</artifactId>
4 <version>LATEST</version>
5</dependency>

alfi-aws

Gradle
groovy
1// If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
2// (like Parameter Store Configuration support)
3implementation group: 'com.gremlin', name: 'alfi-aws', version: '0.5+'
Maven
xml
1<!-- If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS
2 (like Parameter Store Configuration support) -->
3<dependency>
4 <groupId>com.gremlin</groupId>
5 <artifactId>alfi-aws</artifactId>
6 <version>LATEST</version>
7</dependency>

alfi-apache-http-client

Gradle
groovy
1// Apache HTTP Client Injection Points
2implementation group: 'com.gremlin', name: 'alfi-apache-http-client', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-apache-http-client</artifactId>
4 <version>LATEST</version>
5</dependency>

alfi-http-servlet-filter

Gradle
groovy
1implementation group: 'com.gremlin', name: 'alfi-http-servlet-filter', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-http-servlet-filter</artifactId>
4 <version>LATEST</version>
5</dependency>

alfi-aws-dynamodb-client

Gradle
groovy
1// DynamoDB Injection Points
2implementation group: 'com.gremlin', name: 'alfi-aws-dynamodb-client', version: '0.5+'
Maven
xml
1<dependency>
2 <groupId>com.gremlin</groupId>
3 <artifactId>alfi-aws-dynamodb-client</artifactId>
4 <version>LATEST</version>
5</dependency>

Authentication & configuration

Authenticate your application with Gremlin

In order to authenticate to Gremlin, you must provide the following configuration values to your application.

  • GREMLIN_ALFI_IDENTIFIER : A unique identifier for the application. This will be used to distinguish all of the application instances from one another
  • GREMLIN_TEAM_ID : The Team ID that this application belongs to. Only users in that team may conduct attacks on it.
  • GREMLIN_TEAM_CERTIFICATE_OR_FILE : Certificate for authenticating to Gremlin. See below for syntax on permissible values.
  • GREMLIN_TEAM_PRIVATE_KEY_OR_FILE : Private key for authenticating to Gremlin. See below for syntax on permissible values.

You may set these as environment variables or in a gremlin.properties file on the classpath. Certificates can be downloaded for each team from the Settings Page.

Examples

As a raw value

bash
1GREMLIN_TEAM_CERTIFICATE_OR_FILE=-----BEGIN CERTIFICATE-----...

Or pointing to a file

bash
1GREMLIN_TEAM_CERTIFICATE_OR_FILE=file:///usr/gremlin/certificate.pem

Optional configuration

The following keys may be set to tune how ALFI operates.

  • GREMLIN_ALFI_ENABLED : If set to anything other than true, all functionality is turned off. This is designed to give you the ability to safely deploy ALFI, knowing you've got a simple off-switch. When the functionality is off, no failures are ever injected by ALFI, no calls are made to the API, and no logging past configuration-time occurs.
  • GREMLIN_REFRESH_INTERVAL_MS : You may optionally provide this value to set the frequency with which the library will contact the Gremlin API. Minimum of 1000 (1 second), maximum of 300000 (5 minutes). Default of 10000 (10 seconds). This determines how quickly your application reacts to attacks being halted or created and the amount of network traffic generated by the library.
  • http_proxy : You may specify a proxy for traffic from the ALFI library back to the Gremlin control plane. This may optionally include basic auth.
Examples
  • GREMLIN_ALFI_ENABLED=true
  • GREMLIN_ALFI_IDENTIFIER=recommendation-service-i-0ab123456
  • GREMLIN_REFRESH_INTERVAL_MS=20000
  • http_proxy=http://proxy.server:3128
  • http_proxy=http://username:password@proxy.server:3128

Alternate configuration mechanism

As described above, the default configuration resolution mechanism is to use either properties defined in gremlin.properties, or in environment variables where your application runs. If those don't fit your needs, then you can provide an alternate mechanism by subclassing GremlinConfigurationResolver (javadocs) and supplying it to GremlinServiceFactory (javadocs) at construction-time.

Setup

Step by step

In a hurry? Skip to Complete examples.

  • Construct an ApplicationCoordinates instance.
  • Construct a TrafficCoordinates instance.
  • Optionally (if using a custom TrafficCoordinates instance) construct a GremlinService singleton.
  • Optionally (if using a custom TrafficCoordinates instance) inject the fault using com.gremlin.GremlinService#applyImpact(trafficCoordinates). Add this line of code anywhere in your application, you wish the fault to be injected.
  • Click here to create a new Attack.
  • Select an Application Query.
  • Set the necessary fields for the selected Application Query.
  • Select a Traffic Query.
  • Choose a Gremlin attack - Set the amount of latency in ms to apply and optionally throw a RuntimeException within your application.
  • Run the attack - Set the duration in seconds for how long the attack will last.
  • Test your application to observe the impact of the attack.

Complete examples

ALFI AWS
java
1package com.alfilambda;
2
3import com.amazonaws.services.lambda.runtime.Context;
4import com.amazonaws.services.lambda.runtime.LambdaLogger;
5import com.amazonaws.services.lambda.runtime.RequestHandler;
6import com.gremlin.*;
7import com.gremlin.aws.AwsApplicationCoordinatesResolver;
8
9import java.time.Duration;
10import java.time.Instant;
11import java.util.Map;
12
13public class AlfiDemoHandler implements RequestHandler<Map<String,String>, String> {
14
15 private final GremlinService gremlinService;
16
17 public AlfiDemoHandler() {
18 final GremlinServiceFactory factory = new GremlinServiceFactory(new GremlinCoordinatesProvider() {
19 @Override
20 public ApplicationCoordinates initializeApplicationCoordinates() {
21 ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
22 .orElseThrow(IllegalStateException::new);
23 return coords;
24 }
25 });
26 gremlinService = factory.getGremlinService();
27 }
28
29 @Override
30 public String handleRequest(Map<String, String> input, Context context) {
31 Instant start = Instant.now();
32 TrafficCoordinates trafficCoordinates = new TrafficCoordinates.Builder()
33 .withType(this.getClass().getSimpleName())
34 .withField("method", "handleRequest")
35 .build();
36 gremlinService.applyImpact(trafficCoordinates);
37 LambdaLogger logger = context.getLogger();
38 Instant finish = Instant.now();
39 long timeElapsed = Duration.between(start, finish).toMillis(); //in millis
40 logger.log(String.format("Lambda took %s millis", timeElapsed));
41 return new String("200 OK");
42 }
43}
ALFI DynamoDB
java
1package com.example.alfidynamodb.config;
2
3import com.amazonaws.ClientConfiguration;
4import com.amazonaws.handlers.RequestHandler2;
5import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
6import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
7import com.gremlin.*;
8import com.gremlin.aws.GremlinDynamoRequestInterceptor;
9import org.springframework.beans.factory.annotation.Value;
10import org.springframework.context.annotation.Bean;
11import org.springframework.context.annotation.Configuration;
12@Configuration
13public class AlfiConfig {
14
15 private static final String APPLICATION_QUERY_NAME = "ALFIDemoApplication";
16 private static final int CLIENT_EXECUTION_TIMEOUT = 1500;
17 private static final int CLIENT_REQUEST_TIMEOUT = 500;
18
19 @Value("${aws.region}")
20 private String region;
21
22
23 public GremlinCoordinatesProvider gremlinCoordinatesProvider() {
24 return new GremlinCoordinatesProvider() {
25 @Override
26 public ApplicationCoordinates initializeApplicationCoordinates() {
27 return new ApplicationCoordinates.Builder()
28 .withType(APPLICATION_QUERY_NAME)
29 .build();
30 }
31 };
32 }
33
34 public GremlinServiceFactory gremlinServiceFactory() {
35 return new GremlinServiceFactory(gremlinCoordinatesProvider());
36 }
37
38 public GremlinService gremlinService() {
39 return gremlinServiceFactory().getGremlinService();
40 }
41
42 @Bean
43 public AmazonDynamoDB amazonDynamoDB() {
44 final RequestHandler2 gremlinDynamoInterceptor = new GremlinDynamoRequestInterceptor(gremlinService(), CLIENT_EXECUTION_TIMEOUT, CLIENT_REQUEST_TIMEOUT);
45 return AmazonDynamoDBClientBuilder.standard()
46 .withRegion(region)
47 .withClientConfiguration(new ClientConfiguration()
48 .withClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT)
49 .withConnectionTimeout(CLIENT_REQUEST_TIMEOUT)
50 .withMaxErrorRetry(2)
51 )
52 .withRequestHandlers(gremlinDynamoInterceptor).build();
53 }
54
55}
java
1package com.example.alfidynamodb.persistence;
2
3import com.amazonaws.AmazonServiceException;
4import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
5import com.amazonaws.services.dynamodbv2.model.AttributeValue;
6import com.amazonaws.services.dynamodbv2.model.GetItemRequest;
7import com.amazonaws.services.kms.model.NotFoundException;
8import org.slf4j.Logger;
9import org.slf4j.LoggerFactory;
10import org.springframework.beans.factory.annotation.Autowired;
11import org.springframework.beans.factory.annotation.Value;
12import org.springframework.stereotype.Component;
13
14import java.util.HashMap;
15import java.util.Map;
16@Component
17public class GetItemRequester {
18
19 private final Logger LOG = LoggerFactory.getLogger(getClass().getName());
20
21 @Value("${dynamo.db.table}")
22 private String table;
23
24 private final AmazonDynamoDB amazonDynamoDB;
25
26 public GetItemRequester(@Autowired AmazonDynamoDB amazonDynamoDB) {
27 this.amazonDynamoDB = amazonDynamoDB;
28 }
29
30 public Map<String, AttributeValue> getItem(String id) {
31 long startTime = System.currentTimeMillis();
32 try {
33
34 LOG.info(String.format("Querying DynamoDB for item with ID %s...", id));
35 Map<String, AttributeValue> returnedItem = amazonDynamoDB.getItem(createRequestWithId(id)).getItem();
36 if (returnedItem != null) {
37 return returnedItem;
38 } else {
39 throw new NotFoundException(String.format("Item with id %s not found!", id));
40 }
41 } catch (AmazonServiceException e) {
42 LOG.error(e.getMessage());
43 throw e;
44 } finally {
45 long endTime = System.currentTimeMillis();
46 long duration = (endTime - startTime);
47 LOG.info(String.format("Call to DynamoDB took %s milliseconds.", duration));
48 }
49 }
50
51 private GetItemRequest createRequestWithId(String id) {
52 HashMap<String, AttributeValue> keyToGet = new HashMap<>();
53 keyToGet.put("id", new AttributeValue(id));
54 return new GetItemRequest().withKey(keyToGet).withTableName(table);
55 }
56}
ALFI HTTP Servlet Filter
java
1package com.example.rec;
2
3import com.gremlin.ApplicationCoordinates;
4import com.gremlin.GremlinCoordinatesProvider;
5import com.gremlin.GremlinService;
6import com.gremlin.GremlinServiceFactory;
7import com.gremlin.http.servlet.GremlinServletFilter;
8import org.springframework.boot.web.servlet.FilterRegistrationBean;
9import org.springframework.context.annotation.Bean;
10import org.springframework.context.annotation.Configuration;
11
12@Configuration
13public class WebConfig {
14
15 @Bean
16 public FilterRegistrationBean recommendationsFilterRegistrationBean() {
17 FilterRegistrationBean registrationBean = new FilterRegistrationBean();
18 registrationBean.setName("recs");
19
20 final GremlinCoordinatesProvider alfiCoordinatesProvider = new GremlinCoordinatesProvider() {
21 @Override
22 public ApplicationCoordinates initializeApplicationCoordinates() {
23 return new ApplicationCoordinates.Builder()
24 .withType("local")
25 .withField("service", "recommendations")
26 .build();
27 }
28 };
29 final GremlinServiceFactory alfiFactory = new GremlinServiceFactory(alfiCoordinatesProvider);
30 final GremlinService alfi = alfiFactory.getGremlinService();
31
32 GremlinServletFilter alfiFilter = new GremlinServletFilter(alfi);
33 registrationBean.setFilter(alfiFilter);
34 registrationBean.setOrder(1);
35 return registrationBean;
36 }
37
38}
ALFI Apache Http Client
java
1package com.example.alfiapachehttpclient.config;
2
3import org.springframework.context.annotation.Bean;
4import org.springframework.context.annotation.Configuration;
5import com.gremlin.*;
6
7@Configuration
8public class ALFIConfig {
9
10 private static final String APPLICATION_QUERY_NAME = "ALFIApacheHttpClientDemo";
11
12 public GremlinCoordinatesProvider gremlinCoordinatesProvider() {
13 return new GremlinCoordinatesProvider() {
14 @Override
15 public ApplicationCoordinates initializeApplicationCoordinates() {
16 return new ApplicationCoordinates.Builder()
17 .withType(APPLICATION_QUERY_NAME)
18 .build();
19 }
20 };
21 }
22
23 public GremlinServiceFactory gremlinServiceFactory() {
24 return new GremlinServiceFactory(gremlinCoordinatesProvider());
25 }
26
27 @Bean
28 public GremlinService gremlinService() {
29 return gremlinServiceFactory().getGremlinService();
30 }
31
32}
java
1package com.example.alfiapachehttpclient.config;
2
3import com.gremlin.GremlinService;
4import com.gremlin.http.client.GremlinApacheHttpRequestInterceptor;
5import org.apache.http.client.config.RequestConfig;
6import org.apache.http.impl.client.CloseableHttpClient;
7import org.apache.http.impl.client.HttpClientBuilder;
8import org.springframework.beans.factory.annotation.Autowired;
9import org.springframework.context.annotation.Bean;
10import org.springframework.context.annotation.Configuration;
11
12@Configuration
13public class ApacheClientConfig {
14
15 private final GremlinService gremlinService;
16 private static final int CONNECTION_TIMEOUT = 1000;
17 private static final int SOCKET_TIMEOUT = 3000;
18
19 @Autowired
20 public ApacheClientConfig(GremlinService gremlinService) {
21 this.gremlinService = gremlinService;
22 }
23
24 @Bean
25 public CloseableHttpClient closableHttpClient() {
26 RequestConfig requestConfig = RequestConfig.custom()
27 .setConnectTimeout(CONNECTION_TIMEOUT)
28 .setSocketTimeout(SOCKET_TIMEOUT)
29 .build();
30
31 final GremlinApacheHttpRequestInterceptor gremlinInterceptor =
32 new GremlinApacheHttpRequestInterceptor(gremlinService, "alfi-client-demo");
33 final HttpClientBuilder clientBuilder = HttpClientBuilder
34 .create()
35 .addInterceptorFirst(gremlinInterceptor)
36 .setDefaultRequestConfig(requestConfig);
37
38 return clientBuilder.build();
39 }
40
41
42}
java
1package com.example.alfiapachehttpclient.controller;
2
3import org.apache.http.HttpEntity;
4import org.apache.http.client.methods.CloseableHttpResponse;
5import org.apache.http.client.methods.HttpGet;
6import org.apache.http.impl.client.CloseableHttpClient;
7import org.apache.http.util.EntityUtils;
8import org.slf4j.Logger;
9import org.slf4j.LoggerFactory;
10import org.springframework.beans.factory.annotation.Autowired;
11import org.springframework.http.HttpStatus;
12import org.springframework.http.ResponseEntity;
13import org.springframework.web.bind.annotation.GetMapping;
14import org.springframework.web.bind.annotation.ResponseBody;
15import org.springframework.web.bind.annotation.RestController;
16
17import java.io.IOException;
18
19@RestController
20public class MainController {
21
22 private final Logger LOG = LoggerFactory.getLogger(getClass().getName());
23
24 private final CloseableHttpClient closeableHttpClient;
25 private CloseableHttpResponse closeableHttpResponse;
26
27 @Autowired
28 public MainController(CloseableHttpClient closeableHttpClient) {
29 this.closeableHttpClient = closeableHttpClient;
30 }
31
32 @GetMapping("/")
33 public @ResponseBody
34 ResponseEntity<String> hello() {
35 final String URI = "https://www.gremlin.com/";
36 HttpGet httpGet = new HttpGet(URI);
37 String responseContent = null;
38 long startTime = System.currentTimeMillis();
39 try {
40 LOG.info(String.format("Executing GET request to %s...", URI));
41 closeableHttpResponse = closeableHttpClient.execute(httpGet);
42 HttpEntity httpEntity = closeableHttpResponse.getEntity();
43 responseContent = EntityUtils.toString(httpEntity);
44 EntityUtils.consume(httpEntity);
45 LOG.info(responseContent);
46 } catch (IOException e) {
47 e.printStackTrace();
48 } finally {
49 long endTime = System.currentTimeMillis();
50 long duration = (endTime - startTime);
51 LOG.info(String.format("GET Request took %d milliseconds", duration));
52 try {
53 closeableHttpResponse.close();
54 } catch (IOException e) {
55 e.printStackTrace();
56 }
57 }
58 return new ResponseEntity<>(responseContent, HttpStatus.OK);
59 }
60}
ALFI Core
java
1package com.gremlin.todo.config;
2
3import com.gremlin.*;
4import com.gremlin.todo.ToDoApplication;
5import org.springframework.context.annotation.Bean;
6import org.springframework.stereotype.Service;
7
8import javax.annotation.PostConstruct;
9
10@Configuration
11public class ALFIConfig {
12
13 public GremlinCoordinatesProvider gremlinCoordinatesProvider() {
14 return new GremlinCoordinatesProvider() {
15 @Override
16 public ApplicationCoordinates initializeApplicationCoordinates() {
17 return new ApplicationCoordinates.Builder()
18 .withType("MyApplication")
19 .withField("service", "to-do")
20 .build();
21 }
22 };
23 }
24
25 public GremlinServiceFactory gremlinServiceFactory() {
26 return new GremlinServiceFactory(gremlinCoordinatesProvider());
27 }
28
29 @Bean
30 public GremlinService gremlinService() {
31 return gremlinServiceFactory().getGremlinService();
32 }
33
34}
java
1package com.gremlin.todo.controller;
2
3import com.gremlin.todo.aspect.AdvancedAttack;
4import com.gremlin.todo.aspect.Attack;
5import com.gremlin.todo.dto.ToDoDto;
6import com.gremlin.todo.model.ToDo;
7import com.gremlin.todo.service.ToDoService;
8import org.bson.types.ObjectId;
9import org.springframework.beans.factory.annotation.Autowired;
10import org.springframework.http.HttpEntity;
11import org.springframework.http.HttpStatus;
12import org.springframework.http.ResponseEntity;
13import org.springframework.web.bind.annotation.*;
14
15import java.util.Collection;
16
17@RestController
18public class MyController {
19 private final GremlinService gremlinService;
20 private TrafficCoordinates getAllToDosCoordinates;
21
22 @Autowired
23 public MyController(GremlinService gremlinService) {
24 this.gremlinService = gremlinService;
25 }
26
27 @GetMapping("/all")
28 public Collection<ToDo> getAllToDos() {
29 gremlinService.applyImpact(this.getAllToDosCoordinates);
30 return toDoService.findAll();
31 }
32
33 @PostConstruct() {
34 getAllToDosCoordinates = new TrafficCoordinates
35 .Builder()
36 .withType("MyController")
37 .withField("method", "getAllToDos")
38 .build();
39 }
40
41}

Attacks

Integrate the library

To use ALFI, you must first integrate the Gremlin libraries into your application and redeploy. Please see the JVM Installation Guide for more details. Once you have successfully integrated the library, you should see logging like this:

1INFO com.gremlin.GremlinServiceFactory - Gremlin enabled for Team abcdefgh-1234-9876-3333-nopqrstuvwxy

Create attacks via the Web UI

Now you can start creating attacks from the Web UI. Here you will see a history of ALFI attacks run by your team.

Once you click New ALFI Attack, you will receive a form with Application Type, Traffic Type, and Impact sections.

Application Type

This section provides a way to determine which applications are eligible for the ALFI attack.

Upon application startup, the ALFI code running in each application creates an instance of ApplicationCoordinates and passes that to the Gremlin API. Each ApplicationCoordinates instance is eligible to pick up an ALFI attack. Please see Application Coordinates Setup for details on how to populate ApplicationCoordinates.

The ALFI library comes with two Application Types out of the box: AWS Lambda and AWS EC2. Custom Application Types can also be created from your application, which can then be used in the Web UI with the Add Custom Field button. Keep in mind that the most effective chaos experiments start small, so keep your custom Application Types as specific as possible.

Traffic Type

This section provides a way to select individual requests within your application and only impact that set.

Any attribute which you have supplied in a TrafficCoordinates is eligible to use in constructing the attack. Please see Traffic Coordinates Setup and Attaching Request Context data to all TrafficCoordinates for details on how to control the data being placed into a TrafficCoordinates instance.

The ALFI library includes integrations for the Apache HTTP client and Dynamo DB client (with more to come!), however you are free to create any sort of Traffic Type you would like and use those custom fields as attributes of the attack.

For Traffic Type, you may also supply a Percentage of Traffic value. As probability is used to target this percentage, the actual impact may not exactly reflect the value specified.

Impact

This section provides a way to declare what impact you would like to inject.

You may choose an amount of latency to inject as well as a yes/no switch on whether you want this call to fail. These can also be combined to simulate a slow call which eventually fails. This impact gets applied to all traffic which matches the Traffic Type you've described above on the Application Type you've described above.

In this section, you also are required to declare the duration of the attack. For this duration, the attack is active and ALFI-enabled applications are impacted. As soon as the duration elapses, the applications no longer know about the attack and are no longer impacted.

Observe attack results

Once you press the Unleash Gremlin button, the attack becomes active and applications will start picking it up. Here you can see all of the attributes used in scoping the attack, as well as what the impact is and the duration of the attack. The attack then starts progressing through different phases of its lifecycle, as described here:

StageDescription
PendingCreated but no applications have picked up the attack
DistributedAt least one application has picked up the attack, but none have been impacted
ImpactedAt least one application has picked up the attack and been impacted
SuccessfulImpact was applied and duration elapsed
ApplicationNotFoundNo application ever picked up the attack and duration elapsed
TrafficNotFoundNo application ever applied impact and duration elapsed
HaltedAttack was halted (by UI or API) prior to the duration elapsing

Libraries

Java Client library

In ALFI, each application has a set of identifying attributes. This set of attributes is named ApplicationCoordinates and is used to determine when an application matches an attack.

ApplicationCoordinates

AWS Lambda Function

  • Dependency: alfi-aws
  • .inferFromEnvironment() will extract the region and name of your Lambda function from your environment and use it as the Region and Name fields respectively the in the Gremlin UI.
java
1ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
2 .orElseThrow(IllegalStateException::new);

AWS Lambda Function

AWS EC2 Application

  • Dependency: alfi-aws
  • .inferFromEnvironment() will extract the region, availability zone and instance ID from your environment and use it as the Region, Availability Zone and Instance ID fields respectively the in the Gremlin UI.
java
1ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
2 .orElseThrow(IllegalStateException::new);

AWS EC2 Application

Custom Application Type

Let's imagine you have an application called TheShop which contains a UserService and a PaymentService. In this case, to uniquely identify each of these services in the Gremlin control plane, you would construct two ApplicationCoordinates, each with the same value set for the withType(...) field and a unique value set for the .withField(...).

java
1ApplicationCoordinates coords = ApplicationCoordinates.Builder()
2 .withType("TheShop")
3 .withField("service", "UserService")
4 .build();
java
1ApplicationCoordinates coords = ApplicationCoordinates.Builder()
2 .withType("TheShop")
3 .withField("service", "PaymentService")
4 .build();

Take notice of the withType(...) and withField(...) methods. The value defined in the withType(...) method will need to be defined in the Name field of the Gremlin UI (see images below). The value defined in the withField(...) method will need to be defined in the Custom Value field of the Gremlin UI (see images below).

Custom Application Type

Custom Application Type Single Service

TrafficCoordinates

com.gremlin.TrafficCoordinates instances are used to control the blast radius of an ALFI experiment. The blast radius for ALFI could be all or a subset of HTTP verbs, all or a subset of your application's HTTP request paths, or even a specific block of code within your application.

Outbound HTTP Traffic

The com.gremlin.TrafficCoordinates instance for Outbound HTTP Traffic will be automatically generated by the com.gremlin.http.client.GremlinApacheHttpRequestInterceptor which comes with the alfi-apache-http-client library. This interceptor will give you the ability to impact any HTTP verb or request route within your application. To take advantage of the com.gremlin.http.client.GremlinApacheHttpRequestInterceptor, you will need to add an instance of it to org.apache.http.impl.client.HttpClientBuilder when you create your org.apache.http.client.HttpClient client.

java
1final GremlinApacheHttpRequestInterceptor gremlinInterceptor = new GremlinApacheHttpRequestInterceptor(gremlinService, "alfi-client-demo");
2final HttpClientBuilder clientBuilder = HttpClientBuilder.create().addInterceptorFirst(gremlinInterceptor);

Outbound HTTP Traffic

Inbound HTTP Traffic

com.gremlin.TrafficCoordinates instances are automatically created for you if alfi-http-servlet-filter is on the classpath.

Inbound HTTP Traffic

Dynamo DB Traffic

The com.gremlin.TrafficCoordinates instance for Dynamo DB Traffic will be automatically generated by the com.gremlin.aws.GremlinDynamoRequestInterceptor which comes with the alfi-aws library. This interceptor will give you the ability to impact any DynamoDB operation (Get Item, Delete Item, etc...). To take advantage of the com.gremlin.aws.GremlinDynamoRequestInterceptor, you will need to add an instance of it to com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder when you create your com.amazonaws.services.dynamodbv2.AmazonDynamoDB client.

java
1final RequestHandler2 gremlinDynamoInterceptor = new GremlinDynamoRequestInterceptor(gremlinService(), CLIENT_EXECUTION_TIMEOUT, CLIENT_REQUEST_TIMEOUT);
2final AmazonDynamoDB dbClient = AmazonDynamoDBClientBuilder
3 .standard()
4 .withRegion(region)
5 .withClientConfiguration(new ClientConfiguration()
6 .withClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT)
7 .withConnectionTimeout(CLIENT_REQUEST_TIMEOUT)
8 .withMaxErrorRetry(2)
9 ).withRequestHandlers(gremlinDynamoInterceptor)
10 .build();

Dynamo DB Traffic

Custom Traffic Type

java
1final TrafficCoordinates trafficCoordinates = new TrafficCoordinates.Builder()
2 .withType("PaymentController")
3 .withField("method", "submitPayment")
4 .build();
5
6public HttpEntity<PaymentResponse> submitPayment(Payment paymentRequest) {
7 this.gremlinService.applyImpact(trafficCoordinates); // Fault injected!
8 return paymentService.makePayment(paymentRequest);
9}

Custom Traffic Type

Extend TrafficCoordinates

Often, companies set up their infrastructure to maintain a per-request data structure and use this information to provide logging, monitoring, and observability data points. A common pattern is to set up a RequestContext and have authentication filters put in information like customerId or deviceId into the RequestContext object. This object then permits access from any later point, so that those attributes are easily available. These are often excellent locations on which to create attacks. If your system operates in this way, then you can set up a mapping to populate these values on all TrafficCoordinates. This code lives in a concrete subclass of GremlinCoordinatesProvider, which you've already seen in: Initialize Application Coordinates.

java
1import com.gremlin.GremlinCoordinatesProvider;
2import com.gremlin.TrafficCoordinates;
3
4public class MyCoordinatesProvider extends GremlinCoordinatesProvider {
5
6 @Override
7 public TrafficCoordinates extendEachTrafficCoordinates(TrafficCoordinates incomingCoordinates) {
8 incomingCoordinates.putField("customerId", MyRequestContext.getCustomerId());
9 incomingCoordinates.putField("deviceId", MyRequestContext.getDeviceId());
10 incomingCoordinates.putField("country", MyRequestContext.getCountry());
11 return incomingCoordinates;
12 }
13}

With this code wired into the construction of your GremlinService instance, all TrafficCoordinates will now get those 3 attributes and they are eligible to be matched for any type of traffic you'd like to attack.

GremlinService

To create a com.gremlin.GremlinService, you need a com.gremlin.GremlinCoordinatesProvider, which needs a com.gremlin.ApplicationCoordinates.

To construct a GremlinService using the alfi-aws library:

java
1final GremlinServiceFactory factory = new GremlinServiceFactory(new GremlinCoordinatesProvider() {
2 @Override
3 public ApplicationCoordinates initializeApplicationCoordinates() {
4 ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()
5 .orElseThrow(IllegalStateException::new);
6 return coords;
7 }
8 });
9final GremlinService gremlinService = factory.getGremlinService();

Injecting fault

Once you have a reference to the com.gremlin.GremlinService singleton and have defined your Custom com.gremlin.TrafficCoordinates, you can inject fault like this:

java
1gremlinService.applyImpact(trafficCoordinates);

Release Notes

0.7.4

July 7, 2020

Fix

If the gremlin.properties file was on the classpath, Gremlin was not properly using it when resolving configuration. ​

0.7.3

December 23, 2019

Fix

Change the payload of the authorization header sent to Gremlin API to resolve HTTP 401s from a server-side change that does extra certificate validation.

New

Added support for HTTP proxy. Set http_proxy environment variable, and ALFI traffic to Gremlin API will use the specified proxy URL. ​

0.7.2

April 24, 2019

Fix

Allow certificate parsing to work properly on Windows.

Info

Updated dependencies. ​

0.7.1

April 11, 2019

Fix

Much friendlier error messages when installation/setup is unsuccessful. ​

0.7.0

April 2, 2019

New

Addition of Inbound HTTP injections points, both for javax.servlet Filters and JAX-RS Filters. ​

0.6.1

February 21, 2019

Info

Updated dependencies. ​

0.6.0

February 12, 2019

Fix

Allow chaining of property sources, so that a failure to lookup in Parameter Store still allows a lookup from environment variables. ​

0.5.3

January 22, 2019

Info

Release process changes only. ​

0.5.2

January 10, 2019

Info

Change artifact location to maven.gremlin.com. ​

0.5.1

October 23, 2018

Info

The GREMLIN_ALFI_IDENTIFIER is required (previously was optional) when authenticating your application with Gremlin. ​

0.5.0

October 11, 2018

New

Install with Maven now available.

New

Client library modules available individually.

New

AWS Parameter Store can be used for configuration.