Failure Flags > Failure Flags

Failure Flags

Supported platforms:

N/A

Gremlin Failure Flags lets you run Chaos Engineering experiments, Scenarios, and reliability tests on serverless workloads, containers, and similar managed environments. Just like feature flags, Failure Flags let you perform experiments on specific parts of your services and applications with minimal impact to your application code and no performance impact when disabled. Failure Flags are safe to deploy in your application and will default to disabled when you have no actively running experiments.

Use-Cases

Failure Flags is an application level fault injection tool and its use-cases cover simulating or realizing those failures in your system that either have impact at the application level or target application data. These typically represent the bulk of the issues teams see day-to-day. Issues like:

Incorrect or corrupt data
Customer-specific failures
Lock-contention on hot data
Breaking API changes
Unexpected API responses
Partial service failures
Message double-delivery or ordering issues

But more than testing issues, Failure Flags can help you:

Test observability and alarm configuration
Exercise automated recovery systems
Isolate experiments in any environment to well-knows users or customers

Architecture and Performance Impact

Failure Flags involves integration with your applications and for that reason it is critical that you can be confident that adopting this technology will not adversely affect either the availability or performance of those applications outside of experiment parameters. Failure Flags - like other Gremlin products - is designed to fail safely.

Failure Flags is made up of three major components: the Gremlin SaaS API, the Failure Flags Sidecar or Lambda Extension, and one of the SDKs. No impact to your applications is possible unless all three are configured correctly at runtime. Working backwards from your application:

The SDK must be integrated with your application and explicitly enabled via environment variable.
The sidecar or extension must be deployed with your application and use a common <span class="code-class-custom">localhost</span> interface.
The sidecar or extension must be enabled and provided with current credentials to the Gremlin API via environment variables or other configuration options.
The sidecar or extension must have a stable network route to the Gremlin API and be provided with configuration required to traverse corporate proxies.
Your company Gremlin account must have Failure Flags enabled.
Your team must have created and run an experiment.

Any misconfiguration, configuration omission, or service outage can only prevent experimentation and will minimize any adverse impact to your applications. Further, the various Failure Flags SDKs are published under the Apache-2.0 license. You're encouraged to audit those libraries as you see fit. Adopting Failure Flags will in no way lock-in your applications to Gremlin.

Takeaways

It is safe to add Failure Flags to your code and leave them there
It is easy to prevent experimentation in any environment
The SDKs are licensed under Apache-2.0
Adding Failure Flags will not create lock-in

Supported Platforms

Failure Flags can run on any platform or environment that supports multiple processes with shared localhost. These include most if not all Kubernetes platforms, AWS Lambda, AWS ECS, virtual machines, container platforms with shared network namespaces, and many others (like your laptop). Gremlin currently provides support for:

AWS Lambda
AWS ECS
Kubernetes

Gremlin does provide executables and a variety of packages that can be used in other platforms but we cannot provide support for those at this time.

Supported Languages and Frameworks

The Failure Flags SDKs are language-specific and released under the Apache 2.0 license. These include support for:

JavaScript / TypeScript / NodeJS
Python
Go
Java
C# / .NET

Each of these are minimal SDKs and support similar features and semantics when possible.

Preparing and Next Steps

Before you’ll be able to use Failure Flags you’ll need to gather some information and do a little pre-work:

Identify the Application you will Instrument: Consider the common use-cases listed above and decide which of your applications you'll get started with.
Firewalls and Routes: Make sure that the network your chosen application is deployed into has a route to beta.gremlin.com and api.gremlin.com.
Proxy Configuration: If that network uses an outbound HTTP or HTTPS proxy you'll need to gather its URL, any credentials, and certificate material it uses. That certificate material should be PEM encoded.
New Library Dependencies: You will add a library dependency to your project. If your organization uses an internal package / library cache make sure that you've included the Failure Flags SDK for your application.

See the following pages to get started:

‍

Privileges required

Privilege	Description
CLIENTS_READ	Allows reading all client information within the team
CLIENTS_WRITE	Allows editing all client information within the team
EXPERIMENTS_RUN	Allows running an experiment within a team
EXPERIMENTS_READ	Allows reading all experiment information within a team
EXPERIMENTS_WRITE	Allows creating or updating an experiment for a team

‍

Deploying Failure Flags on AWS Lambda