Dashboard
Failure Flags

Deploying Failure Flags on Kubernetes

This document will walk you through setting up Failure-Flags-Sidecar, a small per-process sidecar agent. Failure-Flags-Sidecar runs alongside your application and is responsible for managing Chaos Engineering experiments and reliability tests.

Note
The Failure Flags agents are not in critical path for your application logic or network. They are never exposed to sensitive customer data (encrypted or otherwise). They do not act as network proxies. They do periodically reach out to Gremlin to determine if there are any experiments targeting the attached application and cache those results for a short time.

Adding Failure-Flags-Sidecar to your Pod or Deployment

Failure-Flags-Sidecar container images are available via DockerHub and support both AMD64/x86_64 and ARM64 architectures. These container images include a LICENSE file and a single binary program built for Linux. Alternatively, you can download archives directly: arm64, x86_64.

All versions are listed in a file at: https://assets.gremlin.com/packages/failure-flags-sidecar/VERSIONS.

Setting required environment variables

You can add Failure-Flags-Sidecar to any pod without impacting your application availability or performance. But you do need to add configuration to your environment variables before Failure-Flags-Sidecar will add any value. Configuration comes in via environment variables and or configuration files.

Get started quickly with environment variables only:

  1. <span class="code-class-custom">FAILURE_FLAGS_ENABLED</span> must be set to either <span class="code-class-custom">true</span> or <span class="code-class-custom">yes</span> or <span class="code-class-custom">1</span> to enable the Failure Flags SDK in your application.
  2. <span class="code-class-custom">GREMLIN_SIDECAR_ENABLED</span> must be set to either <span class="code-class-custom">true</span> or <span class="code-class-custom">yes</span> or <span class="code-class-custom">1</span> to enable Failure-Flags-Sidecar. If unset or set to any other value Failure-Flags-Sidecar will operate in NOOP mode.
  3. <span class="code-class-custom">GREMLIN_TEAM_ID</span> must be set to your Gremlin Team ID. This and other credential material is available through the Gremlin UI.
  4. <span class="code-class-custom">GREMLIN_TEAM_CERTIFICATE</span> must be set to your Gremlin Team certificate. Newlines may be preserved using the <span class="code-class-custom">\n</span> escape characters or omited entirely. This and other credential material is available through the Gremlin UI.
  5. <span class="code-class-custom">GREMLIN_TEAM_PRIVATE_KEY</span> must be set to your Gremlin Team private key. Newlines may be preserved using the <span class="code-class-custom">\n</span> escape characters or omited entirely. This and other credential material is available through the Gremlin UI.

Setting Targeting Environment Variables

You will want to set custom targeting labels to uniquely identify deployments of your software. Setting custom labels is done through environment variables with a prefix <span class="code-class-custom">GREMLIN_LABEL_</span>. Any environment variable set on the sidecar with that prefix will be included as labels on the service. For example:

An environment variable `GREMLIN_LABEL_CUSTOM` with the value `custom value` will result in the label: "CUSTOM: custom value".


Individual Configuration Values from Files or ARNs

You can configure individual configuration values like <span class="code-class-custom">GREMLIN_TEAM_CERTIFICATE</span>, <span class="code-class-custom">GREMLIN_TEAM_PRIVATE_KEY</span>, and <span class="code-class-custom">GREMLIN_CUSTOM_ROOT_CERTIFICATE</span> to retrieve values from files in the sidecar container or from AWS services using their ARNs. Instead of setting those environment values directly, use their <span class="code-class-custom">_FILE</span> or <span class="code-class-custom">_ARN</span> counterparts. Files must be fully qualified paths from the filesystem root. This project currently supports <span class="code-class-custom">secretsmanager</span> secret and <span class="code-class-custom">ssm</span> paramter ARNs.

When you add the Failure-Flags-Sidecar to your pod spec and configure the environment variables correctly, your application will be able to consult that extension for Gremlin experiment configuration. You will be able to find your Function in the Gremlin UI under Failure Flags > Services after you launch your app with the layer configured and you exercise the integration.

Once you've added Failure-Flags-Sidecar to your project you can use the Failure Flags library (Node, Python, Java, Go) from your code!

Example Pod Spec with Failure Flags Sidecar

Adding the sidecar means including an additional task in any ECS application where you want to use Failure Flags.

YAML
apiVersion: v1
kind: Secret
metadata:
  name: example-gremlin-secret
type: Opaque
data:
  ## Base64 Encoded Gremlin Team Id
  team_id: ZmZmZmZmZmYtZmZmZi1mZmZmLWZmZmYtZmZmZmZmZmZmZmZmCg==
  ## Base64 Encoded Gremlin Team Certificate
  team_certificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCkV4YW1wbGVYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWApYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFgKWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYClhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWApYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFgKWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYClhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWApYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFgKWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYClhYWFhYWFhYCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  ## Gremlin Team Certificate
  team_private_key: LS0tLS1CRUdJTiBFQyBQUklWQVRFIEtFWS0tLS0tCkV4YW1wbGVYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWApYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFgKWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWD09Ci0tLS0tRU5EIEVDIFBSSVZBVEUgS0VZLS0tLS0K
---
apiVersion: app/v1
kind: Deployment
metadata:
  name: sidecar-demo
  labels:
    app: sidecar-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sidecar-demo
  template:
    metadata:
      labels:
        app: sidecar-demo
    spec:
      containers:
       - name: demo-application
         image: YOUR IMAGE HERE
         env:
          ## FAILURE_FLAGS_ENABLED
          - name: FAILURE_FLAGS_ENABLED
            value: "true"

       ## THIS CONTAINER IS THE SIDECAR
       - name: gremlin
         image: gremlin/failure-flags-sidecar:latest
         imagePullPolicy: Always
         env:
          ## GREMLIN_SIDECAR_ENABLED
          - name: GREMLIN_SIDECAR_ENABLED
            value: "true"
          ## GREMLIN_API_ENDPOINT_URL
          - name: GREMLIN_API_ENDPOINT_URL
            value: "https://beta.gremlin.com/v1"
          ## GREMLIN_TEAM_ID
          - name: GREMLIN_TEAM_ID
            valueFrom:
              secretKeyRef:
                name: example-gremlin-secret
                key: team_id
          ## GREMLIN_TEAM_CERTIFICATE
          - name: GREMLIN_TEAM_CERTIFICATE
            valueFrom:
              secretKeyRef:
                name: example-gremlin-secret
                key: team_certificate
          ## GREMLIN_TEAM_PRIVATE_KEY
          - name: GREMLIN_TEAM_PRIVATE_KEY
            valueFrom:
              secretKeyRef:
                name: example-gremlin-secret
                key: team_private_key
          ## GREMLIN_DEBUG will enable debug logging to standard out of the sidecar
          - name: GREMLIN_DEBUG
            value: "true"
          ## SERVICE_NAME is the name of the application you're connecting to Gremlin
          - name: SERVICE_NAME
            value: "demo-application"
          ## REGION is the name of the region or data center you're deploying into (for targeting)
          - name: REGION
            value: "demo"
---
apiVersion: v1
kind: Service
metadata:
  name: demo-entrypoint
spec:
  type: NodePort
  selector:
    app: sidecar-demo
  ports:
   - port: 3000
     targetPort: 3000
     nodePort: 30001
No items found.
Previous
Next
Previous
This is some text inside of a div block.
Compatibility
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Integrations
Health Checks
Notifications
Command Line Interface
Updating Gremlin
Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Targets
Experiments
Scenarios
GameDays
Overview
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Examples
Container security
General
Linux
Windows
Chao
Helm
Glossary
Alfi
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Application Level Fault Injection (ALFI)
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Jira
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Overview
Overview
Overview
Overview
Overview
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Role Based Access Control
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Slack
Teams
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Users
Webhooks
Integration Agent for Linux
Test Suites
Restricting Testing Times
Reports
Process Exhaustion Experiment
Enabling DNS collection