How to run a Chaos Engineering experiment on AWS Lambda using C# (.NET) and Failure Flags
In this tutorial, you’ll learn how to run a Chaos Engineering experiment on a .NET application running on AWS Lambda using Failure Flags. Failure Flags is a Gremlin feature that lets you inject faults into applications and services running on fully managed serverless environments, such as AWS Lambda, Azure Functions, and Google Cloud Functions. With Failure Flags, you can:
- Add latency or errors to applications.
- Inject data into function calls without having to edit or re-deploy source code.
- Simulate partial outages behind API gateways or reverse proxies.
- Customize the behavior and impact of experiments.
This tutorial will focus on testing a C# application on AWS Lambda. You can learn about our other supported languages and platforms in our Failure Flags documentation.
Overview
This tutorial will show you how to:
- Install the Failure Flags .NET SDK.
- Deploy an application and the Failure Flags agent to AWS Lambda.
- Run a latency experiment using Failure Flags.
Prerequisites
Before starting this tutorial, you’ll need:
- A Gremlin account (sign up for a free trial here).
- An AWS account with access to Lambda (you can use the lowest-tier x86 or Arm instance for this tutorial to save on costs).
- The .NET SDK installed on your local machine. As of this writing, Lambda supports .NET 8 and .NET 9. This tutorial was written using .NET 8.0.113.
This tutorial uses an application template provided by AWS. The template is a basic, single-file C# application that takes a string as input, converts it to all uppercase letters, and returns it as output. To use this template, first install the Amazon Lambda Templates and Tools from NuGet:
dotnet add package Gremlin.FailureFlags
dotnet add package System.Text.Json
Next, create a new empty Lambda Function and provide a name, profile, and region. This creates the folder and configures the project with your settings. In this example, the function is named failure-flags_dotnet
:
dotnet new lambda.EmptyFunction --name failure-flags_dotnet --profile default --region us-east-2
Step 1 - Set up your C# application with Failure Flags
In this step, we’ll add a Failure Flag to our C# application. This application converts the user’s input string to all uppercase letters and returns it as a JSON object. We’ll also configure the Failure Flag to target specific user requests based on their input.
First, add Failure Flags as a dependency by running the following command in your project’s main directory (the same directory containing your .csproj
file). We’ll also need to add JSON support:
dotnet add package Gremlin.FailureFlags
dotnet add package System.Text.Json
Next, open your Function.cs
file. In the FunctionHandler
method, add the following code. This creates a new Failure Flag named failure-flags_dotnet
. It also creates a label named “input” that passes the user’s input back to Gremlin. This will let us fine-tune our experiments to only impact requests matching specific inputs.
public string FunctionHandler(string input, ILambdaContext context)
{
var gremlin = new GremlinFailureFlags();
gremlin.Invoke(new FailureFlag()
{
Name = "failure-flags_dotnet",
Labels = new Dictionary<string, string>()
{
{ "input", input.ToUpper() }
}
});
...
}
Finally, we need to configure Lambda’s execution environment. This template contains a JSON file named aws-lambda-tools-defaults.json
that configures the function’s runtime, such as setting its region, architecture, .NET version, and resources. More importantly, we can use it to set the environment variables needed to run Failure Flags.
Open your aws-lambda-tools-defaults.json
file and add the following string at the bottom of the list (remember to add a comma after the previous entry). This enables the Failure Flags SDK and sidecar, and it tells Failure Flags where to find our Gremlin configuration file, which we’ll download in the next step:
...
"environment-variables" : "FAILURE_FLAGS_ENABLED=1;GREMLIN_LAMBDA_ENABLED=1;GREMLIN_CONFIG_FILE=/var/task/config.yaml"
Step 2 - Download your client configuration file
Before deploying your application, you must ensure it can authenticate with Gremlin. Gremlin provides an auto-generated client configuration file that you can use to authenticate any Gremlin agent, including Failure Flags agents. This file contains your Gremlin team ID and TLS certificates, but you can add additional labels like your application name, version number, region, etc.
- Download your client configuration file from the Gremlin web app and save it in the root directory of your project folder as
config.yaml
. - Add the configuration file to your project. This process varies depending on your development environment, but the easiest (if you’re using Visual Studio) is to right-click your Solution, expand the “Add” menu, click “Existing Item,” and select your
config.yaml
file. - Optionally, add more labels to your configuration file. You can use these labels to identify unique deployments of this application, letting you fine-tune which deployments to impact during experiments. For example, you could add the following block to identify your function as part of the
us-east-2
region, letting you target all functions running inus-east-2
:
labels:
datacenter: us-east-2
The configuration file supports other options, but the defaults are all you need for this tutorial.
Step 3 - Deploy your .NET application to Lambda
So far, we’ve configured our application and the Failure Flags SDK. The SDK injects faults into our app, but it doesn’t handle communicating with Gremlin’s backend servers or orchestrating experiments. For that, we need to deploy the Failure Flags Lambda layer alongside our application.
First, find the ARN (Amazon Resource Name) of the Failure Flags Lambda layer you want to use. You can use our table to look this up. ARNs vary based on the region you’re deploying your function to and the architecture you’re running it on. For example, an ARM64 function running in eu-west-3 would use the ARN arn:aws:lambda:eu-west-3:044815399860:layer:gremlin-lambda-arm64:17
.
Next, publish your project and the Failure Flags Lambda layer with the following command. Make sure to replace <function_name>
with your function’s actual name and <failure-flags_arn>
with the ARN corresponding to your region and architecture. For example, this deploys the function using the x86_64 layer in us-east-2:
dotnet lambda deploy-function failure-flags_dotnet --function-layers arn:aws:lambda:us-east-2:044815399860:layer:gremlin-lambda-x86_64:17
Now, we just need to verify that our function is working properly. Run the following command to send a “Hello world!” message, and you should receive a response soon after:
dotnet lambda invoke-function failure-flags_dotnet --payload "Hello world\!"
Amazon Lambda Tools for .NET Core applications (5.12.4)
Project Home: https://github.com/aws/aws-extensions-for-dotnet-cli, https://github.com/aws/aws-lambda-dotnet
Payload:
"{\"upper\":\"HELLO WORLD!\",\"executionTime\":180}"
Log Tail:
[gremlin-lambda] Starting Gremlin Lambda [Version: v1.1.3, Debug: false]
[gremlin-lambda] [993e6dc6-b3f7-469e-857d-c4e64c5dbe16] Gremlin Lambda is running
[gremlin-lambda] [993e6dc6-b3f7-469e-857d-c4e64c5dbe16] registered with Gremlin Data Plane API, roundtrip in 330.305223ms, agent ID: 00000000-0000-0000-0000-000000000000
[gremlin-lambda] [993e6dc6-b3f7-469e-857d-c4e64c5dbe16] cache updated, current experiment set: {}
EXTENSION Name: gremlin-lambda State: Ready Events: [INVOKE, SHUTDOWN]
START RequestId: 6010c8cd-0027-43c8-a2f8-df448c02cf3f Version: $LATEST
END RequestId: 6010c8cd-0027-43c8-a2f8-df448c02cf3f
REPORT RequestId: 6010c8cd-0027-43c8-a2f8-df448c02cf3f Duration: 382.61 ms Billed Duration: 383 ms Memory Size: 512 MB Max Memory Used: 87 MB Init Duration: 411.86 ms
Step 4 - Run an experiment
Now, let’s run an experiment on our .NET application.
For this experiment, imagine our application consistently errors whenever users enter “gremlins” as the input. This could be for several reasons: maybe the string contains an unsupported character, uses an unexpected character encoding standard, or can’t be serialized into JSON. We want to know how our function handles cases like these: does it handle them gracefully? Does it try to process them anyway? Or does it crash?
To test this, we’ll run an experiment that throws an exception whenever a request contains “gremlins” and observe our application to see what it does.
- In the Gremlin web app, select Failure Flags in the navigation pane (or click this link).
- Click + Experiment to create a new experiment.
- Enter a name for the new experiment, such as “Exception on input - gremlins.”
- Click the Failure Flag Selector box to open a list of Failure Flags. You can type into this box to search, then click on the Flag you created earlier. If your app doesn't show up, confirm that it's fully deployed onto Lambda and has responded to at least one request.
- In the Attributes box, select input. This corresponds to the input label in the application.
- In the Value box, enter “gremlins” as the string we want to trigger the exception on.
- Note that you can match multiple values by separating them with commas.
- In the Service Selector box, select the service (i.e., the function) you want to target. This menu will only list actively detected services. You can see a list of active services in the Gremlin web app.
- By default, any instances of your application created while the experiment is running will also run the experiment. You can prevent this by enabling “Prevent new services from joining at runtime.”
- Optionally, you can fine-tune the specific instances to run the experiment on using the Service Selector section. We’ll leave this empty to target all instances.
- The Effects box is where you specify the impact that you want to have on your app. In this case, select exception from the box. Optionally, add an exception message, such as “The input could not be processed.”
- Set the Impact Probability percentage. This is the probability that the Failure Flag will run for any specific invocation. For now, set it to 100% to ensure every call to this function gets impacted.
- Set the Experiment Duration to how long you want to run the experiment. 5 min ensures you’ll have plenty of time to observe the experiment’s impact. You can always stop the experiment early using the Halt button.
- Click Save & Run to start the experiment.
While the experiment runs, send a request to your function with only “gremlins” in the body.
dotnet lambda invoke-function failure-flags_dotnet --payload "gremlins"
Payload:
{
"errorType": "FailureFlagException",
"errorMessage": "Exception injected by failure flag: The input could not be processed.",
"stackTrace": [
"at FailureFlags.Exception.ApplyBehavior(Experiment[] experiments)",
"at FailureFlags.GremlinFailureFlags.Invoke(FailureFlag flag, IBehavior behavior)",
"at dotnet.Function.FunctionHandler(String input, ILambdaContext context) in /home/gremlin/Development/failure-flags-examples/dotnet/src/dotnet/Function.cs:line 30",
"at lambda_method1(Closure, Stream, ILambdaContext, Stream)",
"at Amazon.Lambda.RuntimeSupport.HandlerWrapper.<>c__DisplayClass8_0.<GetHandlerWrapper>b__0(InvocationRequest invocation) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/HandlerWrapper.cs:line 54",
"at Amazon.Lambda.RuntimeSupport.LambdaBootstrap.InvokeOnceAsync(CancellationToken cancellationToken) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/LambdaBootstrap.cs:line 237"
]
}
As a result, the rest of the function didn’t continue processing, and we didn’t get our response. We can mitigate this by encapsulating our function’s logic in a try/catch/finally block. At the very least, we can log the stack trace and display a less obtuse error message to the user.
When you're finished making observations and want to stop the experiment, simply click Halt this experiment in the Gremlin web app to stop the experiment.
Conclusion
Now that you have Failure Flags set up, try running different experiments. Add network latency, impact only a percentage of requests or service instances, or combine effects to add latency and errors. You can even define your experiments or inject your own strings into your application at runtime.
We also have a sidecar for Kubernetes if you want to use Failure Flags outside of Lambda. Just deploy the sidecar, then define and run your experiment the same way you did in this tutorial. Remember that Failure Flags has no performance or availability impacts on your application when not in use, so don't be afraid to add it wherever reliability is a concern. We also have SDKs for Node.js, Java, Python, and Golang.
Related
How to run an experiment on AWS Lambda using Failure Flags and Node.js
Introduction: In this tutorial, we'll show you how to run a Chaos Engineering experiment on a serverless application using Failure Flags. Failure Flags lets you run experiments on applications and services, particularly those that limit access to the underlying infrastructure, such as AWS Lambda, Azure Functions, Google Cloud Functions, and others. This includes:
How to run a Chaos Engineering experiment on AWS Lambda using Java and Failure Flags
Learn how to improve the resiliency of your Java applications running on AWS Lambda using Gremlin Failure Flags.
How to run a Chaos Engineering experiment on AWS Lambda using Golang and Failure Flags
Learn how to improve the resiliency of your Go applications running on AWS Lambda using Gremlin Failure Flags.
How to run a Chaos Engineering experiment on AWS Lambda using Python and Failure Flags
Failure Flags lets you test and improve the reliability of your applications, without requiring agents or system-level access. Learn how it works in this tutorial.
Avoid downtime. Use Gremlin to turn failure into resilience.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
