Dashboard
Quick Start Guides

AWS Quick Start Guide

This guide will walk you through deploying Gremlin to your AWS environment, identifying services, and running the AWS Test Suite.

Overview

Gremlin RM lets you run comprehensive reliability tests on services running in your environment. It tests several key reliability behaviors of each service, such as its scalability, redundancy, and ability to tolerate failed or slow dependencies. Gremlin then assigns a reliability score to the service based on the outcome of these tests.

For AWS users, we provide a streamlined onboarding process. Once you deploy the Gremlin agent to your Amazon EC2 or EKS instances, Gremlin automatically detects the services running on those instances using your Elastic Load Balancer (ELB) traffic. We then define those services in the Gremlin web app and generate a suite of ready-to-run reliability tests. We also give you the option to automatically create Health Checks based on each service’s key metrics. Once the services are created and Health Checks are added, you can start running reliability tests on your services.

This guide will walk you through the following steps:

  1. Deploying Gremlin to your AWS environment.
  2. Selecting the Elastic Load Balancers (ELBs) to use to identify services.
  3. Adding AWS Health Checks to your services.

Prerequisites

Before starting this guide, you should deploy the Gremlin agent to your AWS environment.

  • If you’re using Amazon Elastic Kubernetes Service (EKS), follow the instructions in our Helm guide.
  • If you’re using Amazon Elastic Compute Cloud (EC2), follow the instructions in our Linux or Windows installation guides.
  • If you’re using Amazon Elastic Container Service (ECS), follow our standalone container guide.

Step 1: Authenticate with your AWS account

Before Gremlin can detect and add your services, you first need to grant Gremlin access to resources in your AWS account. This will allow Gremlin to:

  • View CloudWatch metrics for use in Health Checks.
  • View Route53 routes, traffic policies, health checks, and other resources.
  • View Elastic Load Balancer (ELB) instances, target groups, policies, and attributes for service identification and mapping.
Note
Gremlin only requires read-only access to your AWS account, and only for CloudWatch, Route 53, and ELB.

There are two ways to authenticate Gremlin: Using an IAM role (recommended), or using a service account.

Authenticating using an IAM role

Authenticating using an IAM role is the recommended method, as it gives you finer control over which resources and permissions Gremlin can access, and without having to share your AWS credentials. Follow the instructions in the Amazon Cloudwatch Health Check documentation to authenticate using IAM. After you’ve clicked Save, return to this guide.

Authenticating using a service account

Authenticating using a service account doesn’t give you as much control as an IAM role, but may be the preferred method for teams that aren’t fully migrated to IAM. Note that you’ll need to provide Gremlin with credentials for it to use the account. Follow the instructions in the Amazon Cloudwatch Health Check documentation to authenticate using a service account. After you’ve clicked Save, return to this guide.

Step 2: Select your Elastic Load Balancer(s)

After successfully authenticating, Gremlin displays a list of all Elastic Load Balancers (ELBs) it detected in your AWS environment. Specifically, it shows ELBs that are connected to a service with a Gremlin agent present. For example, if you installed Gremlin onto an EKS cluster and have one or more ELBs directing traffic to that cluster, then that ELB will appear in the list.

To select one or more ELBs, click on the checkboxes next to its name. You can use the search box to filter the list by name, region, or by tag. You can also select multiple ELBs, or select all ELBs by using the checkbox at the top of the list. When you’ve selected all the ELBs you want to use, click Create Services. Gremlin will use the endpoints identified from the ELB to define your services and generate a suite of ready-to-run reliability tests.

Creating services from auto-detected ELBs.

Step 3: Add Health Checks

Before you can start running reliability tests on a service, you’ll need to add at least one Health Chec. A Health Check is an automated process that checks the state of the service before, during, and after a test. Health Checks ensure that your services are still operating within expectations, but they also serve a second purpose: safety. If your systems become unresponsive, unhealthy, or unstable, the Health Check will automatically stop the actively running test and return your service to its normal operation.

With AWS integration, there are two ways to create Health Checks:

Note

Automatically adding Intelligent Health Checks

Gremlin can use AWS CloudWatch metrics to automatically create Health Checks for you. These Intelligent Health Checks will track the service’s latency, error rates, and request rates.

Note
Health Checks created by Gremlin can not be used with other services, or in Scenarios.

To use Gremlin-created Intelligent Health Checks:

  1. Navigate to an AWS service from the service list.
  2. Click Settings, then click Health Checks.
  3. Under Gremlin Intelligent Health Checks, check the box next to “Use Gremlin Intelligent Health Checks for this service”. Gremlin will immediately create and configure the Health Checks and use them during reliability tests run on this service.

To remove these Health Checks, simply uncheck the box and confirm their removal. This will have no impact on your reliability test scores, though it will prevent you from running reliability tests unless you’ve added another Health Check to the service.

Enabling Gremlin Intelligent Health Checks from the service settings page.

Enabled Gremlin Intelligent Health Checks.

Step 4: Start testing

Now you're ready to run your reliability tests! Return to your service list, click on the service you want to test, find the test you want to run, and click the Run button. Alternatively, you can click the Run All button at the top of the page to run each test in sequence. Gremlin will run the tests, use your Health Checks to monitor your service, and record the results automatically. You'll also see your reliability score increase as a reward for running your first test. Great job!

A completed CPU test.
No items found.
Previous
Next
Previous
This is some text inside of a div block.
Compatibility
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Integrations
Health Checks
Notifications
Command Line Interface
Updating Gremlin
Reliability Management (RM) Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Targets
Experiments
Scenarios
GameDays
Overview
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Examples
Container security
General
Linux
Windows
Chao
Helm
Glossary
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Jira
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Overview
Overview
Overview
Overview
Overview
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Configuring Role Based Access Control (RBAC)
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Slack
Managing Teams
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Managing Users
Webhooks
Integration Agent for Linux
Test Suites
Restricting Testing Times
Reports
Process Exhaustion Experiment
Enabling DNS collection
Authenticating Users with Microsoft Entra ID (Azure Active Directory) via SAML
AWS Quick Start Guide