Dashboard
Getting Started

Troubleshooting Gremlin on OpenShift

Gremlin network timeouts

This issue is most often seen with timeout errors in both Chao and Gremlin logs.

error sending request for url (https://api.gremlin.com/v1/daemon/poll?multiple=1): operation timed out

This usually stems from network rules preventing Gremlin's access to the internet. It's important to figure out what the intended network behavior should be for Gremlin on your infrastructure with some questions:

  • What other services connect to the internet within your cluster?
  • Do services within your cluster rely on an HTTP proxy when connecting to the internet?

OpenShift Egress network policies

If you've reviewed the proxy requirements and determined that Gremlin does not need an HTTP proxy, but you are still unable to connect Gremlin to the internet, it's likely one or more OpenShift projects are preventing internet access with an EgressNetworkPolicy.

You can list such policies in any project with the following

SHELL

oc -n $PROJECT get egressnetworkpolicies

NAME   AGE
test   20m

If you look at the details of such a policy, you can see if network access for api.gremlin.com is denied. Here's an example of a policy which denies api.gremlin.com, because it only allows specific IP address ranges and host names while denying everything else.

SHELL

oc -n $PROJECT get egressnetworkpolicy test -o yaml

YAML

apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
  name: test
  namespace: test
spec:
  egress:
  - to:
      cidrSelector: 1.2.3.0/24
    type: Allow
  - to:
      dnsName: www.foo.com
    type: Allow
  - to:
      cidrSelector: 0.0.0.0/0
    type: Deny

Adding <span class="code-class-custom">api.gremlin.com</span> to such a <span class="code-class-custom">EgressNetworkPolicy</span> will fix this problem.

YAML

apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
  name: test
  namespace: test
spec:
  egress:
  - to:
      cidrSelector: 1.2.3.0/24
    type: Allow
  - to:
      dnsName: www.foo.com
    type: Allow
  - to:
      dnsName: api.gremlin.com
    type: Allow
  - to:
      cidrSelector: 0.0.0.0/0
    type: Deny

Gremlin experiments cannot find target container (OpenShift 4.9)

This issue will generate a variation of the following error:

container details : time="2022-05-11T13:07:21Z" level=error msg="container \"2584cede1cf01e77d9d9ac8f864f99f1c155268ec1095af2bbde850e73d936a2\" does not exist"

See the Openshift 4.9 | Container does not exist article in the Gremlin knowledge base for a workaround.


No items found.
Previous
This is some text inside of a div block.
Compatibility
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Integrations
Health Checks
Notifications
Command Line Interface
Updating Gremlin
Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Targets
Experiments
Scenarios
GameDays
Overview
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Examples
Container security
General
Linux
Windows
Chao
Helm
Glossary
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Jira
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Overview
Overview
Overview
Overview
Overview
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Configuring Role Based Access Control (RBAC)
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Slack
Managing Teams
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Managing Users
Webhooks
Integration Agent for Linux
Test Suites
Restricting Testing Times
Reports
Process Exhaustion Experiment
Enabling DNS collection
Authenticating Users with Microsoft Entra ID (Azure Active Directory) via SAML