Learn from failure. Build resilience.

Gremlin's Chaos Engineering tools allow devs and SREs to safely, securely, and easily simulate real outages with an ever-growing library of attacks. Run game days with the only Failure-as-a-Service platform.

Listeners of the SE Daily podcast who run an attack with Gremlin get a free t-shirt and stickers.

Sign up for free

ExpediaMailChimpMedialaanMoneyLionSiemensTwilioUnder ArmourWalmart

Beyond Free: Truly Comprehensive Fault Injection

Gremlin has a full suite of enterprise-grade failure testing modes so you can find out how resilient your production system is.

Build resilient infrastructure.

Recreate real world technical and business failures and prepare for the failure of dependencies, internal or external.

Resource exhaustion

Which resource is your bottleneck? CPU, Memory, IO, or Disk? Find out for certain.

Bad behavior

Processes die, time drifts, instances reboot. Are you ready?

Unreliable networks

You cannot rely on the network, nor on your dependencies being available. What happens when they slow down or disappear all together?

Build resilient applications.

Recreate application outages to quickly resolve or prevent them in the first place.

Accurately & precisely simulate an outage

Confidently experiment with a tiny blast radius to quickly reduce your failure surface area.

Be resilient to the unknown

Maintain a consistent end user experience by preparing for dependency failure.

Serverless, including Lambda

Understand how your applications behave in the face of failure in a serverless environment.

Surgical precision.

Minimize business impact and maximize learning with precise, fine-grained experiments

  • Experiment on a single user, device, or <attribute> to begin.

  • Request level granularity, from 0.01% of traffic up to 100%.

  • Fail or Delay any part of your application, from functions to endpoints.

How it works.

Run disciplined chaos experiments to identify weak points in your system and fix them before they become a problem.

  • I. Plan an Experiment

    Create a hypothesis for what might go wrong.

  • II. Run the smallest version

    Execute a simple test to see how your system responds.

  • III. Scale or squash

    Scale the experiment until you identify a bug. Then squash it.

Never accidentally cause an outage.

If the unexpected happens, Gremlin's failsafes automatically halt your experiment and fall back to steady state.

Gremlin is built to not only cause failure, but to handle it as well.

Secure from the ground up.

Security is a first class citizen and is part of our DNA.

  • Least Permissions

    Gremlin runs on default Linux permissions and doesn’t require root access.

  • Audit Everything

    Every action taken on the platform creates an audit trail.

  • Ready for Production

    Multi-factor authentication, Secure Single Sign On, Role-based access control (RBAC). Gremlin is secured to allow experimentation in production.

  • Enterprise Grade

    The Gremlin client, daemon, API, and website undergo regular security auditing by an external auditor.

Staggering simplicity.

Get Gremlin up and running in moments with 3 lines of code.

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Use Gremlin for Free and see how you can harness chaos to build resilient systems.

Use For Free