How to run a Chaos Engineering experiment on AWS Lambda using C# (.NET) and Failure Flags

Andre Newman
Sr. Reliability Specialist
Last Updated:
March 11, 2025
Learn how to improve the resiliency of your C# .NET applications running on AWS Lambda using Gremlin Failure Flags.

In this tutorial, you’ll learn how to run a Chaos Engineering experiment on a .NET application running on AWS Lambda using Failure Flags. Failure Flags is a Gremlin feature that lets you inject faults into applications and services running on fully managed serverless environments, such as AWS Lambda, Azure Functions, and Google Cloud Functions. With Failure Flags, you can:

  • Add latency or errors to applications.
  • Inject data into function calls without having to edit or re-deploy source code.
  • Simulate partial outages behind API gateways or reverse proxies.
  • Customize the behavior and impact of experiments.

This tutorial will focus on testing a C# application on AWS Lambda. You can learn about our other supported languages and platforms in our Failure Flags documentation.

Overview

This tutorial will show you how to:

  • Install the Failure Flags .NET SDK.
  • Deploy an application and the Failure Flags agent to AWS Lambda.
  • Run a latency experiment using Failure Flags.

Prerequisites

Before starting this tutorial, you’ll need:

  • A Gremlin account (sign up for a free trial here).
  • An AWS account with access to Lambda (you can use the lowest-tier x86 or Arm instance for this tutorial to save on costs).
  • The .NET SDK installed on your local machine. As of this writing, Lambda supports .NET 8 and .NET 9. This tutorial was written using .NET 8.0.113.

This tutorial uses an application template provided by AWS. The template is a basic, single-file C# application that takes a string as input, converts it to all uppercase letters, and returns it as output. To use this template, first install the Amazon Lambda Templates and Tools from NuGet:

.NET

dotnet add package Gremlin.FailureFlags
dotnet add package System.Text.Json

Next, create a new empty Lambda Function and provide a name, profile, and region. This creates the folder and configures the project with your settings. In this example, the function is named failure-flags_dotnet:

.NET

dotnet new lambda.EmptyFunction --name failure-flags_dotnet --profile default --region us-east-2

Step 1 - Set up your C# application with Failure Flags

In this step, we’ll add a Failure Flag to our C# application. This application converts the user’s input string to all uppercase letters and returns it as a JSON object. We’ll also configure the Failure Flag to target specific user requests based on their input.

First, add Failure Flags as a dependency by running the following command in your project’s main directory (the same directory containing your .csproj file). We’ll also need to add JSON support:

.NET

dotnet add package Gremlin.FailureFlags
dotnet add package System.Text.Json

Next, open your Function.cs file. In the FunctionHandler method, add the following code. This creates a new Failure Flag named failure-flags_dotnet. It also creates a label named “input” that passes the user’s input back to Gremlin. This will let us fine-tune our experiments to only impact requests matching specific inputs.

.NET

public string FunctionHandler(string input, ILambdaContext context)
{
var gremlin = new GremlinFailureFlags();
	gremlin.Invoke(new FailureFlag()
	{
		Name = "failure-flags_dotnet",
		Labels = new Dictionary<string, string>()
		{
			{ "input", input.ToUpper() }
		}
	});
	...
}

Finally, we need to configure Lambda’s execution environment. This template contains a JSON file named aws-lambda-tools-defaults.json that configures the function’s runtime, such as setting its region, architecture, .NET version, and resources. More importantly, we can use it to set the environment variables needed to run Failure Flags.

Open your aws-lambda-tools-defaults.json file and add the following string at the bottom of the list (remember to add a comma after the previous entry). This enables the Failure Flags SDK and sidecar, and it tells Failure Flags where to find our Gremlin configuration file, which we’ll download in the next step:

.NET

...
"environment-variables" : "FAILURE_FLAGS_ENABLED=1;GREMLIN_LAMBDA_ENABLED=1;GREMLIN_CONFIG_FILE=/var/task/config.yaml"

Step 2 - Download your client configuration file

Before deploying your application, you must ensure it can authenticate with Gremlin. Gremlin provides an auto-generated client configuration file that you can use to authenticate any Gremlin agent, including Failure Flags agents. This file contains your Gremlin team ID and TLS certificates, but you can add additional labels like your application name, version number, region, etc.

  1. Download your client configuration file from the Gremlin web app and save it in the root directory of your project folder as config.yaml.
  2. Add the configuration file to your project. This process varies depending on your development environment, but the easiest (if you’re using Visual Studio) is to right-click your Solution, expand the “Add” menu, click “Existing Item,” and select your config.yaml file.
  3. Optionally, add more labels to your configuration file. You can use these labels to identify unique deployments of this application, letting you fine-tune which deployments to impact during experiments. For example, you could add the following block to identify your function as part of the us-east-2 region, letting you target all functions running in us-east-2:

.NET

labels:
    datacenter: us-east-2

The configuration file supports other options, but the defaults are all you need for this tutorial.

Step 3 - Deploy your .NET application to Lambda

So far, we’ve configured our application and the Failure Flags SDK. The SDK injects faults into our app, but it doesn’t handle communicating with Gremlin’s backend servers or orchestrating experiments. For that, we need to deploy the Failure Flags Lambda layer alongside our application.

First, find the ARN (Amazon Resource Name) of the Failure Flags Lambda layer you want to use. You can use our table to look this up. ARNs vary based on the region you’re deploying your function to and the architecture you’re running it on. For example, an ARM64 function running in eu-west-3 would use the ARN arn:aws:lambda:eu-west-3:044815399860:layer:gremlin-lambda-arm64:17.

Next, publish your project and the Failure Flags Lambda layer with the following command. Make sure to replace <function_name> with your function’s actual name and <failure-flags_arn> with the ARN corresponding to your region and architecture. For example, this deploys the function using the x86_64 layer in us-east-2:

.NET

dotnet lambda deploy-function failure-flags_dotnet --function-layers arn:aws:lambda:us-east-2:044815399860:layer:gremlin-lambda-x86_64:17

Now, we just need to verify that our function is working properly. Run the following command to send a “Hello world!” message, and you should receive a response soon after:

.NET

dotnet lambda invoke-function failure-flags_dotnet --payload "Hello world\!"

.NET

Amazon Lambda Tools for .NET Core applications (5.12.4)
Project Home: https://github.com/aws/aws-extensions-for-dotnet-cli, https://github.com/aws/aws-lambda-dotnet
	
Payload:
"{\"upper\":\"HELLO WORLD!\",\"executionTime\":180}"

Log Tail:
[gremlin-lambda] Starting Gremlin Lambda [Version: v1.1.3, Debug: false]
[gremlin-lambda] [993e6dc6-b3f7-469e-857d-c4e64c5dbe16] Gremlin Lambda is running
[gremlin-lambda] [993e6dc6-b3f7-469e-857d-c4e64c5dbe16] registered with Gremlin Data Plane API, roundtrip in 330.305223ms, agent ID: 00000000-0000-0000-0000-000000000000
[gremlin-lambda] [993e6dc6-b3f7-469e-857d-c4e64c5dbe16] cache updated, current experiment set: {}
EXTENSION	Name: gremlin-lambda	State: Ready	Events: [INVOKE, SHUTDOWN]
START RequestId: 6010c8cd-0027-43c8-a2f8-df448c02cf3f Version: $LATEST
END RequestId: 6010c8cd-0027-43c8-a2f8-df448c02cf3f
REPORT RequestId: 6010c8cd-0027-43c8-a2f8-df448c02cf3f	Duration: 382.61 ms	Billed Duration: 383 ms	Memory Size: 512 MB	Max Memory Used: 87 MB	Init Duration: 411.86 ms

Step 4 - Run an experiment

Now, let’s run an experiment on our .NET application.

For this experiment, imagine our application consistently errors whenever users enter “gremlins” as the input. This could be for several reasons: maybe the string contains an unsupported character, uses an unexpected character encoding standard, or can’t be serialized into JSON. We want to know how our function handles cases like these: does it handle them gracefully? Does it try to process them anyway? Or does it crash?

To test this, we’ll run an experiment that throws an exception whenever a request contains “gremlins” and observe our application to see what it does.

  1. In the Gremlin web app, select Failure Flags in the navigation pane (or click this link).
  2. Click + Experiment to create a new experiment.
  3. Enter a name for the new experiment, such as “Exception on input - gremlins.”
  4. Click the Failure Flag Selector box to open a list of Failure Flags. You can type into this box to search, then click on the Flag you created earlier. If your app doesn't show up, confirm that it's fully deployed onto Lambda and has responded to at least one request.
  5. In the Attributes box, select input. This corresponds to the input label in the application.
  6. In the Value box, enter “gremlins” as the string we want to trigger the exception on.
    1. Note that you can match multiple values by separating them with commas.
  7. In the Service Selector box, select the service (i.e., the function) you want to target. This menu will only list actively detected services. You can see a list of active services in the Gremlin web app.
    1. By default, any instances of your application created while the experiment is running will also run the experiment. You can prevent this by enabling “Prevent new services from joining at runtime.”
    2. Optionally, you can fine-tune the specific instances to run the experiment on using the Service Selector section. We’ll leave this empty to target all instances.
  8. The Effects box is where you specify the impact that you want to have on your app. In this case, select exception from the box. Optionally, add an exception message, such as “The input could not be processed.”
  9. Set the Impact Probability percentage. This is the probability that the Failure Flag will run for any specific invocation. For now, set it to 100% to ensure every call to this function gets impacted.
  10. Set the Experiment Duration to how long you want to run the experiment. 5 min ensures you’ll have plenty of time to observe the experiment’s impact. You can always stop the experiment early using the Halt button.
  11. Click Save & Run to start the experiment.

While the experiment runs, send a request to your function with only “gremlins” in the body.

.NET

dotnet lambda invoke-function failure-flags_dotnet --payload "gremlins"

.NET

Payload:
{
  "errorType": "FailureFlagException",
  "errorMessage": "Exception injected by failure flag: The input could not be processed.",
  "stackTrace": [
    "at FailureFlags.Exception.ApplyBehavior(Experiment[] experiments)",
    "at FailureFlags.GremlinFailureFlags.Invoke(FailureFlag flag, IBehavior behavior)",
    "at dotnet.Function.FunctionHandler(String input, ILambdaContext context) in /home/gremlin/Development/failure-flags-examples/dotnet/src/dotnet/Function.cs:line 30",
    "at lambda_method1(Closure, Stream, ILambdaContext, Stream)",
    "at Amazon.Lambda.RuntimeSupport.HandlerWrapper.<>c__DisplayClass8_0.<GetHandlerWrapper>b__0(InvocationRequest invocation) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/HandlerWrapper.cs:line 54",
    "at Amazon.Lambda.RuntimeSupport.LambdaBootstrap.InvokeOnceAsync(CancellationToken cancellationToken) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/LambdaBootstrap.cs:line 237"
  ]
}

As a result, the rest of the function didn’t continue processing, and we didn’t get our response. We can mitigate this by encapsulating our function’s logic in a try/catch/finally block. At the very least, we can log the stack trace and display a less obtuse error message to the user.

When you're finished making observations and want to stop the experiment, simply click Halt this experiment in the Gremlin web app to stop the experiment.

Conclusion 

Now that you have Failure Flags set up, try running different experiments. Add network latency, impact only a percentage of requests or service instances, or combine effects to add latency and errors. You can even define your experiments or inject your own strings into your application at runtime.

We also have a sidecar for Kubernetes if you want to use Failure Flags outside of Lambda. Just deploy the sidecar, then define and run your experiment the same way you did in this tutorial. Remember that Failure Flags has no performance or availability impacts on your application when not in use, so don't be afraid to add it wherever reliability is a concern. We also have SDKs for Node.js, Java, Python, and Golang.

No items found.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
start your trial

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Product Hero ImageShape