Application Level Fault Injection (ALFI)
The beta version of Gremlinβs application layer failure injection solution (ALFI) has been closed. At this time the ALFI solution is deprecated and will be replaced with a better alternative once available.
Overview
Why application-level fault injection is useful
Operators think in requests
Most metrics, dashboards, and alerts that we consume are in terms of requests. RPS, error rate, and latency all implicitly use a request as a unit of work. Requests are not a concept available at the infrastructure-level. At that level, all we see are streams of packets with IP addresses and ports. By moving up to the application-level, we can use all of the request-level metadata in constructing an attack.
Since requests can include identifiers like customer ID, device ID, country, etc, those facets may be used in constructing an attack. When you have that ability, it is much easier to create a small, well-defined blast radius in your attack. That, in turn, allows for much faster feedback loops and lets you discover latent problems more quickly.
Fault injection without system access
Injecting infrastructure failures requires running a process and accessing other system-level resources. In serverless environments such as AWS Lambda, Google Cloud Functions, and Azure Functions, this access is impossible. In these cases, it is necessary to include the fault-injection mechanism within the application itself. ALFI runs in the JVM as a library, so once you have integrated it into your application, you may use it in any environment.
Examples
- Simulate an outage in production by creating an attack on your customer ID only. Then you can look for signs of problems when logged in as yourself, while no other users are even aware an attack is occurring.
- Simulate a problem with a specific endpoint. Partial failure in distributed systems is quite common - some endpoints may be unavailable while others are working perfectly. In order to simulate such a scenario, you can create an attack targeted to some endpoints only and then determine how your system reacts.
- Always-on failure testing. If you limit an attack to a set of devices you control, then you can run tests against those devices on a regular basis and evaluate how the user experience works when the system is degraded.
Installation
Artifact repository
Gradle
1repositories {2 maven {3 url 'https://maven.gremlin.com/'4 }5}
Maven
1<repositories>2 <repository>3 <snapshots>4 <enabled>false</enabled>5 </snapshots>6 <id>gremlin</id>7 <name>The Gremlin Repository</name>8 <url>https://maven.gremlin.com/</url>9 </repository>10</repositories>
You must add the above repository to your maven or gradle file. Otherwise, you will encounter an error message similar to Could not find artifact com.gremlin:[client]:pom:[version] in central (https://repo.maven.apache.org/maven2)
alfi-core
Gradle
1implementation group: 'com.gremlin', name: 'alfi-core', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-core</artifactId>4 <version>LATEST</version>5</dependency>
alfi-aws
Gradle
1// If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS2// (like Parameter Store Configuration support)3implementation group: 'com.gremlin', name: 'alfi-aws', version: '0.5+'
Maven
1<!-- If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS2 (like Parameter Store Configuration support) -->3<dependency>4 <groupId>com.gremlin</groupId>5 <artifactId>alfi-aws</artifactId>6 <version>LATEST</version>7</dependency>
alfi-apache-http-client
Gradle
1// Apache HTTP Client Injection Points2implementation group: 'com.gremlin', name: 'alfi-apache-http-client', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-apache-http-client</artifactId>4 <version>LATEST</version>5</dependency>
alfi-http-servlet-filter
Gradle
1implementation group: 'com.gremlin', name: 'alfi-http-servlet-filter', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-http-servlet-filter</artifactId>4 <version>LATEST</version>5</dependency>
alfi-aws-dynamodb-client
Gradle
1// DynamoDB Injection Points2implementation group: 'com.gremlin', name: 'alfi-aws-dynamodb-client', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-aws-dynamodb-client</artifactId>4 <version>LATEST</version>5</dependency>
Artifact repository
Gradle
1repositories {2 maven {3 url 'https://maven.gremlin.com/'4 }5}
Maven
1<repositories>2 <repository>3 <snapshots>4 <enabled>false</enabled>5 </snapshots>6 <id>gremlin</id>7 <name>The Gremlin Repository</name>8 <url>https://maven.gremlin.com/</url>9 </repository>10</repositories>
You must add the above repository to your maven or gradle file. Otherwise, you will encounter an error message similar to Could not find artifact com.gremlin:[client]:pom:[version] in central (https://repo.maven.apache.org/maven2)
alfi-core
Gradle
1implementation group: 'com.gremlin', name: 'alfi-core', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-core</artifactId>4 <version>LATEST</version>5</dependency>
alfi-aws
Gradle
1// If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS2// (like Parameter Store Configuration support)3implementation group: 'com.gremlin', name: 'alfi-aws', version: '0.5+'
Maven
1<!-- If your application is hosted on AWS EC2 or Lambda, use this to integrate with AWS2 (like Parameter Store Configuration support) -->3<dependency>4 <groupId>com.gremlin</groupId>5 <artifactId>alfi-aws</artifactId>6 <version>LATEST</version>7</dependency>
alfi-apache-http-client
Gradle
1// Apache HTTP Client Injection Points2implementation group: 'com.gremlin', name: 'alfi-apache-http-client', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-apache-http-client</artifactId>4 <version>LATEST</version>5</dependency>
alfi-http-servlet-filter
Gradle
1implementation group: 'com.gremlin', name: 'alfi-http-servlet-filter', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-http-servlet-filter</artifactId>4 <version>LATEST</version>5</dependency>
alfi-aws-dynamodb-client
Gradle
1// DynamoDB Injection Points2implementation group: 'com.gremlin', name: 'alfi-aws-dynamodb-client', version: '0.5+'
Maven
1<dependency>2 <groupId>com.gremlin</groupId>3 <artifactId>alfi-aws-dynamodb-client</artifactId>4 <version>LATEST</version>5</dependency>
Authentication & configuration
Authenticate your application with Gremlin
In order to authenticate to Gremlin, you must provide the following configuration values to your application.
GREMLIN_ALFI_IDENTIFIER
: A unique identifier for the application. This will be used to distinguish all of the application instances from one anotherGREMLIN_TEAM_ID
: The Team ID that this application belongs to. Only users in that team may conduct attacks on it.GREMLIN_TEAM_CERTIFICATE_OR_FILE
: Certificate for authenticating to Gremlin. See below for syntax on permissible values.GREMLIN_TEAM_PRIVATE_KEY_OR_FILE
: Private key for authenticating to Gremlin. See below for syntax on permissible values.
You may set these as environment variables or in a gremlin.properties
file on the classpath. Certificates can be downloaded for each team from the Settings Page.
Examples
As a raw value
1GREMLIN_TEAM_CERTIFICATE_OR_FILE=-----BEGIN CERTIFICATE-----...
Or pointing to a file
1GREMLIN_TEAM_CERTIFICATE_OR_FILE=file:///usr/gremlin/certificate.pem
Optional configuration
The following keys may be set to tune how ALFI operates.
GREMLIN_ALFI_ENABLED
: If set to anything other thantrue
, all functionality is turned off. This is designed to give you the ability to safely deploy ALFI, knowing you've got a simple off-switch. When the functionality is off, no failures are ever injected by ALFI, no calls are made to the API, and no logging past configuration-time occurs.GREMLIN_REFRESH_INTERVAL_MS
: You may optionally provide this value to set the frequency with which the library will contact the Gremlin API. Minimum of 1000 (1 second), maximum of 300000 (5 minutes). Default of 10000 (10 seconds). This determines how quickly your application reacts to attacks being halted or created and the amount of network traffic generated by the library.http_proxy
: You may specify a proxy for traffic from the ALFI library back to the Gremlin control plane. This may optionally include basic auth.
Examples
GREMLIN_ALFI_ENABLED=true
GREMLIN_ALFI_IDENTIFIER=recommendation-service-i-0ab123456
GREMLIN_REFRESH_INTERVAL_MS=20000
http_proxy=http://proxy.server:3128
http_proxy=http://username:password@proxy.server:3128
Alternate configuration mechanism
As described above, the default configuration resolution mechanism is to use either properties defined in gremlin.properties
, or in environment variables where your application runs. If those don't fit your needs, then you can provide an alternate mechanism by subclassing GremlinConfigurationResolver (javadocs) and supplying it to GremlinServiceFactory (javadocs) at construction-time.
Setup
Step by step
In a hurry? Skip to Complete examples.
- Construct an ApplicationCoordinates instance.
- Construct a TrafficCoordinates instance.
- Optionally (if using a custom TrafficCoordinates instance) construct a GremlinService singleton.
- Optionally (if using a custom TrafficCoordinates instance) inject the fault using
com.gremlin.GremlinService#applyImpact(trafficCoordinates)
. Add this line of code anywhere in your application, you wish the fault to be injected. - Click here to create a new Attack.
- Select an
Application Query
. - Set the necessary fields for the selected
Application Query
.- These are defined when setting up the ApplicationCoordinates.
- Select a
Traffic Query
.- These are defined when setting up the TrafficCoordinates.
- Choose a Gremlin attack - Set the amount of latency in ms to apply and optionally throw a
RuntimeException
within your application. - Run the attack - Set the duration in seconds for how long the attack will last.
- Test your application to observe the impact of the attack.
Complete examples
ALFI AWS
- This example has been developed for AWS Lambda but could be used in any application one deploys to AWS. Include alfi-aws jar in your calasspath to use this library.
- Source code: https://github.com/gremlin/alfi-lambda
1package com.alfilambda;23import com.amazonaws.services.lambda.runtime.Context;4import com.amazonaws.services.lambda.runtime.LambdaLogger;5import com.amazonaws.services.lambda.runtime.RequestHandler;6import com.gremlin.*;7import com.gremlin.aws.AwsApplicationCoordinatesResolver;89import java.time.Duration;10import java.time.Instant;11import java.util.Map;1213public class AlfiDemoHandler implements RequestHandler<Map<String,String>, String> {1415 private final GremlinService gremlinService;1617 public AlfiDemoHandler() {18 final GremlinServiceFactory factory = new GremlinServiceFactory(new GremlinCoordinatesProvider() {19 @Override20 public ApplicationCoordinates initializeApplicationCoordinates() {21 ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()22 .orElseThrow(IllegalStateException::new);23 return coords;24 }25 });26 gremlinService = factory.getGremlinService();27 }2829 @Override30 public String handleRequest(Map<String, String> input, Context context) {31 Instant start = Instant.now();32 TrafficCoordinates trafficCoordinates = new TrafficCoordinates.Builder()33 .withType(this.getClass().getSimpleName())34 .withField("method", "handleRequest")35 .build();36 gremlinService.applyImpact(trafficCoordinates);37 LambdaLogger logger = context.getLogger();38 Instant finish = Instant.now();39 long timeElapsed = Duration.between(start, finish).toMillis(); //in millis40 logger.log(String.format("Lambda took %s millis", timeElapsed));41 return new String("200 OK");42 }43}
- Click here to create a new Attack
- See Attack and Lambda setup here: https://github.com/gremlin/alfi-lambda/blob/master/README.md
ALFI DynamoDB
- This example uses Spring for Dependency Injection and the alfi-aws-dynamodb-client jar.
- Source code: https://github.com/gremlin/alfi-dynamodb
1package com.example.alfidynamodb.config;23import com.amazonaws.ClientConfiguration;4import com.amazonaws.handlers.RequestHandler2;5import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;6import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;7import com.gremlin.*;8import com.gremlin.aws.GremlinDynamoRequestInterceptor;9import org.springframework.beans.factory.annotation.Value;10import org.springframework.context.annotation.Bean;11import org.springframework.context.annotation.Configuration;12@Configuration13public class AlfiConfig {1415 private static final String APPLICATION_QUERY_NAME = "ALFIDemoApplication";16 private static final int CLIENT_EXECUTION_TIMEOUT = 1500;17 private static final int CLIENT_REQUEST_TIMEOUT = 500;1819 @Value("${aws.region}")20 private String region;212223 public GremlinCoordinatesProvider gremlinCoordinatesProvider() {24 return new GremlinCoordinatesProvider() {25 @Override26 public ApplicationCoordinates initializeApplicationCoordinates() {27 return new ApplicationCoordinates.Builder()28 .withType(APPLICATION_QUERY_NAME)29 .build();30 }31 };32 }3334 public GremlinServiceFactory gremlinServiceFactory() {35 return new GremlinServiceFactory(gremlinCoordinatesProvider());36 }3738 public GremlinService gremlinService() {39 return gremlinServiceFactory().getGremlinService();40 }4142 @Bean43 public AmazonDynamoDB amazonDynamoDB() {44 final RequestHandler2 gremlinDynamoInterceptor = new GremlinDynamoRequestInterceptor(gremlinService(), CLIENT_EXECUTION_TIMEOUT, CLIENT_REQUEST_TIMEOUT);45 return AmazonDynamoDBClientBuilder.standard()46 .withRegion(region)47 .withClientConfiguration(new ClientConfiguration()48 .withClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT)49 .withConnectionTimeout(CLIENT_REQUEST_TIMEOUT)50 .withMaxErrorRetry(2)51 )52 .withRequestHandlers(gremlinDynamoInterceptor).build();53 }5455}
1package com.example.alfidynamodb.persistence;23import com.amazonaws.AmazonServiceException;4import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;5import com.amazonaws.services.dynamodbv2.model.AttributeValue;6import com.amazonaws.services.dynamodbv2.model.GetItemRequest;7import com.amazonaws.services.kms.model.NotFoundException;8import org.slf4j.Logger;9import org.slf4j.LoggerFactory;10import org.springframework.beans.factory.annotation.Autowired;11import org.springframework.beans.factory.annotation.Value;12import org.springframework.stereotype.Component;1314import java.util.HashMap;15import java.util.Map;16@Component17public class GetItemRequester {1819 private final Logger LOG = LoggerFactory.getLogger(getClass().getName());2021 @Value("${dynamo.db.table}")22 private String table;2324 private final AmazonDynamoDB amazonDynamoDB;2526 public GetItemRequester(@Autowired AmazonDynamoDB amazonDynamoDB) {27 this.amazonDynamoDB = amazonDynamoDB;28 }2930 public Map<String, AttributeValue> getItem(String id) {31 long startTime = System.currentTimeMillis();32 try {3334 LOG.info(String.format("Querying DynamoDB for item with ID %s...", id));35 Map<String, AttributeValue> returnedItem = amazonDynamoDB.getItem(createRequestWithId(id)).getItem();36 if (returnedItem != null) {37 return returnedItem;38 } else {39 throw new NotFoundException(String.format("Item with id %s not found!", id));40 }41 } catch (AmazonServiceException e) {42 LOG.error(e.getMessage());43 throw e;44 } finally {45 long endTime = System.currentTimeMillis();46 long duration = (endTime - startTime);47 LOG.info(String.format("Call to DynamoDB took %s milliseconds.", duration));48 }49 }5051 private GetItemRequest createRequestWithId(String id) {52 HashMap<String, AttributeValue> keyToGet = new HashMap<>();53 keyToGet.put("id", new AttributeValue(id));54 return new GetItemRequest().withKey(keyToGet).withTableName(table);55 }56}
- Click here to create a new Attack
- Fill out the Application Query and Traffic Query fields to match this example:
ALFI HTTP Servlet Filter
- This example uses Spring for Dependency Injection and the alfi-http-servlet-filter jar.
- Source code: https://github.com/gremlin/books-demo
1package com.example.rec;23import com.gremlin.ApplicationCoordinates;4import com.gremlin.GremlinCoordinatesProvider;5import com.gremlin.GremlinService;6import com.gremlin.GremlinServiceFactory;7import com.gremlin.http.servlet.GremlinServletFilter;8import org.springframework.boot.web.servlet.FilterRegistrationBean;9import org.springframework.context.annotation.Bean;10import org.springframework.context.annotation.Configuration;1112@Configuration13public class WebConfig {1415 @Bean16 public FilterRegistrationBean recommendationsFilterRegistrationBean() {17 FilterRegistrationBean registrationBean = new FilterRegistrationBean();18 registrationBean.setName("recs");1920 final GremlinCoordinatesProvider alfiCoordinatesProvider = new GremlinCoordinatesProvider() {21 @Override22 public ApplicationCoordinates initializeApplicationCoordinates() {23 return new ApplicationCoordinates.Builder()24 .withType("local")25 .withField("service", "recommendations")26 .build();27 }28 };29 final GremlinServiceFactory alfiFactory = new GremlinServiceFactory(alfiCoordinatesProvider);30 final GremlinService alfi = alfiFactory.getGremlinService();3132 GremlinServletFilter alfiFilter = new GremlinServletFilter(alfi);33 registrationBean.setFilter(alfiFilter);34 registrationBean.setOrder(1);35 return registrationBean;36 }3738}
There is no need to define a TrafficCoordinates
when using the GremlinServletFilter
. This library takes care of that for you.
This enables you to target any verb and any route hosted by your application! For example, you could narrow the blast radius of an attack to only GET
requests to https://somehost/recommendations
.
- Click here to create a new Attack
- Fill out the Application Query and Traffic Query fields to match this example:
ALFI Apache Http Client
- This example uses Spring for Dependency Injection and the alfi-apache-http-client jar.
- Source code: https://github.com/gremlin/alfi-apache-http-client
1package com.example.alfiapachehttpclient.config;23import org.springframework.context.annotation.Bean;4import org.springframework.context.annotation.Configuration;5import com.gremlin.*;67@Configuration8public class ALFIConfig {910 private static final String APPLICATION_QUERY_NAME = "ALFIApacheHttpClientDemo";1112 public GremlinCoordinatesProvider gremlinCoordinatesProvider() {13 return new GremlinCoordinatesProvider() {14 @Override15 public ApplicationCoordinates initializeApplicationCoordinates() {16 return new ApplicationCoordinates.Builder()17 .withType(APPLICATION_QUERY_NAME)18 .build();19 }20 };21 }2223 public GremlinServiceFactory gremlinServiceFactory() {24 return new GremlinServiceFactory(gremlinCoordinatesProvider());25 }2627 @Bean28 public GremlinService gremlinService() {29 return gremlinServiceFactory().getGremlinService();30 }3132}
1package com.example.alfiapachehttpclient.config;23import com.gremlin.GremlinService;4import com.gremlin.http.client.GremlinApacheHttpRequestInterceptor;5import org.apache.http.client.config.RequestConfig;6import org.apache.http.impl.client.CloseableHttpClient;7import org.apache.http.impl.client.HttpClientBuilder;8import org.springframework.beans.factory.annotation.Autowired;9import org.springframework.context.annotation.Bean;10import org.springframework.context.annotation.Configuration;1112@Configuration13public class ApacheClientConfig {1415 private final GremlinService gremlinService;16 private static final int CONNECTION_TIMEOUT = 1000;17 private static final int SOCKET_TIMEOUT = 3000;1819 @Autowired20 public ApacheClientConfig(GremlinService gremlinService) {21 this.gremlinService = gremlinService;22 }2324 @Bean25 public CloseableHttpClient closableHttpClient() {26 RequestConfig requestConfig = RequestConfig.custom()27 .setConnectTimeout(CONNECTION_TIMEOUT)28 .setSocketTimeout(SOCKET_TIMEOUT)29 .build();3031 final GremlinApacheHttpRequestInterceptor gremlinInterceptor =32 new GremlinApacheHttpRequestInterceptor(gremlinService, "alfi-client-demo");33 final HttpClientBuilder clientBuilder = HttpClientBuilder34 .create()35 .addInterceptorFirst(gremlinInterceptor)36 .setDefaultRequestConfig(requestConfig);3738 return clientBuilder.build();39 }404142}
1package com.example.alfiapachehttpclient.controller;23import org.apache.http.HttpEntity;4import org.apache.http.client.methods.CloseableHttpResponse;5import org.apache.http.client.methods.HttpGet;6import org.apache.http.impl.client.CloseableHttpClient;7import org.apache.http.util.EntityUtils;8import org.slf4j.Logger;9import org.slf4j.LoggerFactory;10import org.springframework.beans.factory.annotation.Autowired;11import org.springframework.http.HttpStatus;12import org.springframework.http.ResponseEntity;13import org.springframework.web.bind.annotation.GetMapping;14import org.springframework.web.bind.annotation.ResponseBody;15import org.springframework.web.bind.annotation.RestController;1617import java.io.IOException;1819@RestController20public class MainController {2122 private final Logger LOG = LoggerFactory.getLogger(getClass().getName());2324 private final CloseableHttpClient closeableHttpClient;25 private CloseableHttpResponse closeableHttpResponse;2627 @Autowired28 public MainController(CloseableHttpClient closeableHttpClient) {29 this.closeableHttpClient = closeableHttpClient;30 }3132 @GetMapping("/")33 public @ResponseBody34 ResponseEntity<String> hello() {35 final String URI = "https://www.gremlin.com/";36 HttpGet httpGet = new HttpGet(URI);37 String responseContent = null;38 long startTime = System.currentTimeMillis();39 try {40 LOG.info(String.format("Executing GET request to %s...", URI));41 closeableHttpResponse = closeableHttpClient.execute(httpGet);42 HttpEntity httpEntity = closeableHttpResponse.getEntity();43 responseContent = EntityUtils.toString(httpEntity);44 EntityUtils.consume(httpEntity);45 LOG.info(responseContent);46 } catch (IOException e) {47 e.printStackTrace();48 } finally {49 long endTime = System.currentTimeMillis();50 long duration = (endTime - startTime);51 LOG.info(String.format("GET Request took %d milliseconds", duration));52 try {53 closeableHttpResponse.close();54 } catch (IOException e) {55 e.printStackTrace();56 }57 }58 return new ResponseEntity<>(responseContent, HttpStatus.OK);59 }60}
- Click here to create a new Attack
- Fill out the Application Query and Traffic Query fields to match the following:
ALFI Core
- This example uses Spring for Dependency Injection and the alfi-core jar.
- Source code: https://github.com/gremlin/alfi-spring-boot
1package com.gremlin.todo.config;23import com.gremlin.*;4import com.gremlin.todo.ToDoApplication;5import org.springframework.context.annotation.Bean;6import org.springframework.stereotype.Service;78import javax.annotation.PostConstruct;910@Configuration11public class ALFIConfig {1213 public GremlinCoordinatesProvider gremlinCoordinatesProvider() {14 return new GremlinCoordinatesProvider() {15 @Override16 public ApplicationCoordinates initializeApplicationCoordinates() {17 return new ApplicationCoordinates.Builder()18 .withType("MyApplication")19 .withField("service", "to-do")20 .build();21 }22 };23 }2425 public GremlinServiceFactory gremlinServiceFactory() {26 return new GremlinServiceFactory(gremlinCoordinatesProvider());27 }2829 @Bean30 public GremlinService gremlinService() {31 return gremlinServiceFactory().getGremlinService();32 }3334}
1package com.gremlin.todo.controller;23import com.gremlin.todo.aspect.AdvancedAttack;4import com.gremlin.todo.aspect.Attack;5import com.gremlin.todo.dto.ToDoDto;6import com.gremlin.todo.model.ToDo;7import com.gremlin.todo.service.ToDoService;8import org.bson.types.ObjectId;9import org.springframework.beans.factory.annotation.Autowired;10import org.springframework.http.HttpEntity;11import org.springframework.http.HttpStatus;12import org.springframework.http.ResponseEntity;13import org.springframework.web.bind.annotation.*;1415import java.util.Collection;1617@RestController18public class MyController {19 private final GremlinService gremlinService;20 private TrafficCoordinates getAllToDosCoordinates;2122 @Autowired23 public MyController(GremlinService gremlinService) {24 this.gremlinService = gremlinService;25 }2627 @GetMapping("/all")28 public Collection<ToDo> getAllToDos() {29 gremlinService.applyImpact(this.getAllToDosCoordinates);30 return toDoService.findAll();31 }3233 @PostConstruct() {34 getAllToDosCoordinates = new TrafficCoordinates35 .Builder()36 .withType("MyController")37 .withField("method", "getAllToDos")38 .build();39 }4041}
- Click here to create a new Attack
- Fill out the Application Query and Traffic Query fields to match the following:
The custom value for the traffic type is hidden behind ellipses in that screenshot. The value is getAllToDos
.
Attacks
Integrate the library
To use ALFI, you must first integrate the Gremlin libraries into your application and redeploy. Please see the JVM Installation Guide for more details. Once you have successfully integrated the library, you should see logging like this:
1INFO com.gremlin.GremlinServiceFactory - Gremlin enabled for Team abcdefgh-1234-9876-3333-nopqrstuvwxy
Create attacks via the Web UI
Now you can start creating attacks from the Web UI. Here you will see a history of ALFI attacks run by your team.
Once you click New ALFI Attack
, you will receive a form with Application Type
, Traffic Type
, and Impact
sections.
Application Type
This section provides a way to determine which applications are eligible for the ALFI attack.
Upon application startup, the ALFI code running in each application creates an instance of ApplicationCoordinates
and passes that to the Gremlin API. Each ApplicationCoordinates
instance is eligible to pick up an ALFI attack. Please see Application Coordinates Setup for details on how to populate ApplicationCoordinates
.
The ALFI library comes with two Application Types out of the box: AWS Lambda and AWS EC2. Custom Application Types can also be created from your application, which can then be used in the Web UI with the Add Custom Field
button. Keep in mind that the most effective chaos experiments start small, so keep your custom Application Types as specific as possible.
Traffic Type
This section provides a way to select individual requests within your application and only impact that set.
Any attribute which you have supplied in a TrafficCoordinates
is eligible to use in constructing the attack. Please see Traffic Coordinates Setup and Attaching Request Context data to all TrafficCoordinates for details on how to control the data being placed into a TrafficCoordinates
instance.
The ALFI library includes integrations for the Apache HTTP client and Dynamo DB client (with more to come!), however you are free to create any sort of Traffic Type you would like and use those custom fields as attributes of the attack.
For Traffic Type, you may also supply a Percentage of Traffic
value. As probability is used to target this percentage, the actual impact may not exactly reflect the value specified.
Impact
This section provides a way to declare what impact you would like to inject.
You may choose an amount of latency to inject as well as a yes/no switch on whether you want this call to fail. These can also be combined to simulate a slow call which eventually fails. This impact gets applied to all traffic which matches the Traffic Type you've described above on the Application Type you've described above.
In this section, you also are required to declare the duration of the attack. For this duration, the attack is active and ALFI-enabled applications are impacted. As soon as the duration elapses, the applications no longer know about the attack and are no longer impacted.
Observe attack results
Once you press the Unleash Gremlin
button, the attack becomes active and applications will start picking it up. Here you can see all of the attributes used in scoping the attack, as well as what the impact is and the duration of the attack. The attack then starts progressing through different phases of its lifecycle, as described here:
Stage | Description |
---|---|
Pending | Created but no applications have picked up the attack |
Distributed | At least one application has picked up the attack, but none have been impacted |
Impacted | At least one application has picked up the attack and been impacted |
Successful | Impact was applied and duration elapsed |
ApplicationNotFound | No application ever picked up the attack and duration elapsed |
TrafficNotFound | No application ever applied impact and duration elapsed |
Halted | Attack was halted (by UI or API) prior to the duration elapsing |
Libraries
Java Client library
- alfi-core: Core library required for all ALFI functionality
- alfi-aws: Optional AWS integration, providing coordinate discover for
AwsLambda
andAwsEc2
- alfi-apache-http-client: ALFI injection points for Apache HTTP Client
- alfi-aws-dynamodb-client: ALFI injection points for DynamoDB
In ALFI, each application has a set of identifying attributes. This set of attributes is named ApplicationCoordinates
and is used to determine when an application matches an attack.
ApplicationCoordinates
AWS Lambda Function
- Dependency: alfi-aws
.inferFromEnvironment()
will extract the region and name of your Lambda function from your environment and use it as theRegion
andName
fields respectively the in the Gremlin UI.
1ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()2 .orElseThrow(IllegalStateException::new);
AWS EC2 Application
- Dependency: alfi-aws
.inferFromEnvironment()
will extract the region, availability zone and instance ID from your environment and use it as theRegion
,Availability Zone
andInstance ID
fields respectively the in the Gremlin UI.
1ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()2 .orElseThrow(IllegalStateException::new);
Custom Application Type
- Dependency: Any one of alfi-apache-http-client, alfi-http-servlet-filter or alfi-core
Let's imagine you have an application called TheShop which contains a UserService and a PaymentService. In this case, to uniquely identify each of these services in the Gremlin control plane, you would construct two ApplicationCoordinate
s, each with the same value set for the withType(...)
field and a unique value set for the .withField(...)
.
1ApplicationCoordinates coords = ApplicationCoordinates.Builder()2 .withType("TheShop")3 .withField("service", "UserService")4 .build();
1ApplicationCoordinates coords = ApplicationCoordinates.Builder()2 .withType("TheShop")3 .withField("service", "PaymentService")4 .build();
Take notice of the withType(...)
and withField(...)
methods. The value defined in the withType(...)
method will need to be defined in the Name
field of the Gremlin UI (see images below). The value defined in the withField(...)
method will need to be defined in the Custom Value
field of the Gremlin UI (see images below).
To target both services, configure the UI like this:
To target one of the services, configure the UI like this:
Don't forget to click on the +
icon
TrafficCoordinates
com.gremlin.TrafficCoordinates
instances are used to control the blast radius of an ALFI experiment. The blast radius for ALFI could be all or a subset of HTTP verbs, all or a subset of your application's HTTP request paths, or even a specific block of code within your application.
Outbound HTTP Traffic
The com.gremlin.TrafficCoordinates
instance for Outbound HTTP Traffic will be automatically generated by the com.gremlin.http.client.GremlinApacheHttpRequestInterceptor
which comes with the alfi-apache-http-client library. This interceptor will give you the ability to impact any HTTP verb or request route within your application. To take advantage of the com.gremlin.http.client.GremlinApacheHttpRequestInterceptor
, you will need to add an instance of it to org.apache.http.impl.client.HttpClientBuilder
when you create your org.apache.http.client.HttpClient
client.
1final GremlinApacheHttpRequestInterceptor gremlinInterceptor = new GremlinApacheHttpRequestInterceptor(gremlinService, "alfi-client-demo");2final HttpClientBuilder clientBuilder = HttpClientBuilder.create().addInterceptorFirst(gremlinInterceptor);
The configuration in the screenshot above, targets 50% of all HTTP GET traffic to the application.
The second argument to com.gremlin.http.client.GremlinApacheHttpRequestInterceptor
is a string and must match the value defined in the Client Name (required)
input field of the Gremlin UI.
Inbound HTTP Traffic
com.gremlin.TrafficCoordinates
instances are automatically created for you if alfi-http-servlet-filter is on the classpath.
The configuration in the screenshot above, targets 50% of all HTTP POST requests to the /payments
route
Dynamo DB Traffic
The com.gremlin.TrafficCoordinates
instance for Dynamo DB Traffic will be automatically generated by the com.gremlin.aws.GremlinDynamoRequestInterceptor
which comes with the alfi-aws library. This interceptor will give you the ability to impact any DynamoDB operation (Get Item
, Delete Item
, etc...). To take advantage of the com.gremlin.aws.GremlinDynamoRequestInterceptor
, you will need to add an instance of it to com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder
when you create your com.amazonaws.services.dynamodbv2.AmazonDynamoDB
client.
1final RequestHandler2 gremlinDynamoInterceptor = new GremlinDynamoRequestInterceptor(gremlinService(), CLIENT_EXECUTION_TIMEOUT, CLIENT_REQUEST_TIMEOUT);2final AmazonDynamoDB dbClient = AmazonDynamoDBClientBuilder3 .standard()4 .withRegion(region)5 .withClientConfiguration(new ClientConfiguration()6 .withClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT)7 .withConnectionTimeout(CLIENT_REQUEST_TIMEOUT)8 .withMaxErrorRetry(2)9 ).withRequestHandlers(gremlinDynamoInterceptor)10 .build();
The configuration in the screenshot above, targets 50% of all Get Item traffic to the application.
Custom Traffic Type
1final TrafficCoordinates trafficCoordinates = new TrafficCoordinates.Builder()2 .withType("PaymentController")3 .withField("method", "submitPayment")4 .build();56public HttpEntity<PaymentResponse> submitPayment(Payment paymentRequest) {7 this.gremlinService.applyImpact(trafficCoordinates); // Fault injected!8 return paymentService.makePayment(paymentRequest);9}
The configuration in the screenshot above, targets 50% of all calls to the PaymentController#submitPayment(PaymentRequest paymentRequest)
method.
Extend TrafficCoordinates
Often, companies set up their infrastructure to maintain a per-request data structure and use this information to provide logging, monitoring, and observability data points. A common pattern is to set up a RequestContext
and have authentication filters put in information like customerId
or deviceId
into the RequestContext
object. This object then permits access from any later point, so that those attributes are easily available. These are often excellent locations on which to create attacks. If your system operates in this way, then you can set up a mapping to populate these values on all TrafficCoordinates
. This code lives in a concrete subclass of GremlinCoordinatesProvider
, which you've already seen in: Initialize Application Coordinates.
1import com.gremlin.GremlinCoordinatesProvider;2import com.gremlin.TrafficCoordinates;34public class MyCoordinatesProvider extends GremlinCoordinatesProvider {56 @Override7 public TrafficCoordinates extendEachTrafficCoordinates(TrafficCoordinates incomingCoordinates) {8 incomingCoordinates.putField("customerId", MyRequestContext.getCustomerId());9 incomingCoordinates.putField("deviceId", MyRequestContext.getDeviceId());10 incomingCoordinates.putField("country", MyRequestContext.getCountry());11 return incomingCoordinates;12 }13}
With this code wired into the construction of your GremlinService
instance, all TrafficCoordinates
will now get those 3 attributes and they are eligible to be matched for any type of traffic you'd like to attack.
GremlinService
To create a com.gremlin.GremlinService
, you need a com.gremlin.GremlinCoordinatesProvider
, which needs a com.gremlin.ApplicationCoordinates.
To construct a GremlinService using the alfi-aws library:
1final GremlinServiceFactory factory = new GremlinServiceFactory(new GremlinCoordinatesProvider() {2 @Override3 public ApplicationCoordinates initializeApplicationCoordinates() {4 ApplicationCoordinates coords = AwsApplicationCoordinatesResolver.inferFromEnvironment()5 .orElseThrow(IllegalStateException::new);6 return coords;7 }8 });9final GremlinService gremlinService = factory.getGremlinService();
com.gremlin.GremlinService
should be a singleton.
Injecting fault
Once you have a reference to the com.gremlin.GremlinService
singleton and have defined your Custom com.gremlin.TrafficCoordinates, you can inject fault like this:
1gremlinService.applyImpact(trafficCoordinates);
Release Notes
0.7.4
July 7, 2020
If the gremlin.properties file was on the classpath, Gremlin was not properly using it when resolving configuration. β
0.7.3
December 23, 2019
Change the payload of the authorization header sent to Gremlin API to resolve HTTP 401s from a server-side change that does extra certificate validation.
Added support for HTTP proxy. Set http_proxy environment variable, and ALFI traffic to Gremlin API will use the specified proxy URL. β
0.7.2
April 24, 2019
Allow certificate parsing to work properly on Windows.
Updated dependencies. β
0.7.1
April 11, 2019
Much friendlier error messages when installation/setup is unsuccessful. β
0.7.0
April 2, 2019
Addition of Inbound HTTP injections points, both for javax.servlet Filters and JAX-RS Filters. β
0.6.1
February 21, 2019
Updated dependencies. β
0.6.0
February 12, 2019
Allow chaining of property sources, so that a failure to lookup in Parameter Store still allows a lookup from environment variables. β
0.5.3
January 22, 2019
Release process changes only. β
0.5.2
January 10, 2019
Change artifact location to maven.gremlin.com. β
0.5.1
October 23, 2018
The GREMLIN_ALFI_IDENTIFIER is required (previously was optional) when authenticating your application with Gremlin. β
0.5.0
October 11, 2018
Install with Maven now available.
Client library modules available individually.
AWS Parameter Store can be used for configuration.