This page contains troubleshooting instructions for errors you might encounter. If you can't find the answer to your question, check the Gremlin knowledge base for additional information.
TABLE OF CONTENTS
Gremlin Agent
Unhealthy State
Check that you are only running attacks on active Gremlin Agents. It's possible to run an attack on a Gremlin Agent in an unhealthy state but the attack may not complete. An unhealthy
state indicates that there was an issue with the installation or configuration of the Gremlin Agent. If you see a Gremlin Agent in an unhealthy state or you are experiencing problems running attacks, such as receiving "Attack Interrupted" errors, refer to the Gremlin knowledge base for more information.
LostCommunication
There are several reasons a Gremlin Agent can lose communication to the Gremlin Control Plane. Common examples include
- Running a network based attack that affected the traffic. Ensure both
api.gremlin.com
and DNS are white-listed. - Running a CPU attack has starved
Gremlin
of the ability to compute API encryption. This is rare but it does happen.
In the event of a LostCommunication error, The Gremlin Agent will trigger its dead-man switch and cease all attacks.
Tc Error: RTNETLINK answers: File exists
This can occur on a host when running a network attack, when a previous network attack had been run and the Gremlin Agent was halted mid attack by the user, system, or other tool which did not allow Gremlin to run garbage collection.
To solve, run gremlin rollback
.
Failed to parse execution attribute ‘pid’ for execution < HASH_STRING >
There are two non-exclusive modes of failure that can occur with this error message:
- The running version of Gremlin is several versions out of date
- Update the Gremlin Agent or Docker image
/var/lib/gremlin/executions
has become corrupt- Delete the file
/var/lib/gremlin/executions
- Delete the file
Kubernetes
Run Chao in debug mode
Chao supports the GODEBUG
environment variable, which can be used to enable debug features such as verbose logging of HTTP activity. You can enable verbose HTTP logs by adding the following variable to the environment
section of the Chao deployment.
NOTE: Verbose logging prints sensitive information like HTTP request and response bodies. This configuration is intended to be a troubleshooting measure only, and should be removed when no longer needed.
YAML
- name: GODEBUG
value: http2debug=2
Run Gremlin checks
You can run Gremlin's check
subcommand on Kubernetes clusters to troubleshoot common configuration or compatibility issues with the environment. The following is an example Job that you can run to get gremlin check
output.
YAML
apiVersion: batch/v1
kind: Job
metadata:
name: gremlin-check
namespace: gremlin
labels:
k8s-app: gremlin
version: v1
spec:
template:
metadata:
labels:
app.kubernetes.io/name: gremlin-check
spec:
restartPolicy: Never
containers:
- name: gremlin
image: gremlin/gremlin
# You can also pass subcommands (like `proxy` to check only proxy information)
args: [ "check" ]
env:
# # Pass the same environment you would pass to the Gremlin DaemonSet, including secrets, and proxy information
- name: GREMLIN_TEAM_PRIVATE_KEY_OR_FILE
value: file:///var/lib/gremlin/cert/gremlin.key
- name: GREMLIN_TEAM_CERTIFICATE_OR_FILE
value: file:///var/lib/gremlin/cert/gremlin.cert
- name: GREMLIN_IDENTIFIER
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# # Example proxy configuration
# - name: https_proxy
# value: http://my-proxy:3128
# - name: SSL_CERT_FILE
# value: /etc/gremlin/ssl/proxy-ca.pem
# - name: GREMLIN_TEAM_ID
# value: my-team-id
volumeMounts:
- name: docker-sock
mountPath: /var/run/docker.sock
- name: gremlin-state
mountPath: /var/lib/gremlin
- name: gremlin-logs
mountPath: /var/log/gremlin
- name: gremlin-cert
mountPath: /var/lib/gremlin/cert
readOnly: true
# # Example proxy configuration
# - name: proxy-ca
# mountPath: /etc/gremlin/ssl
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
- name: gremlin-state
hostPath:
path: /var/lib/gremlin
- name: gremlin-logs
hostPath:
path: /var/log/gremlin
- name: gremlin-cert
secret:
secretName: gremlin-secret
# # Example proxy configuration
# - name: proxy-ca
# configMap:
# name: proxy-ca
backoffLimit: 4
Once deployed, you can get the output of gremlin check
by pulling the logs of the Pod associated with the Job:
SHELL
kubectl logs --follow \
--namespace gremlin \
$(kubectl get pods --namespace gremlin --selector=job-name=gremlin-check --output=jsonpath='{.items[*].metadata.name}')
proxy
====================================================
https_proxy : http://proxy.local:3128
http_proxy : (unset)
SSL_CERT_FILE : /etc/gremlin/ssl/proxy-ca.pem
Service Ping : OK
Docker
Non-zero exit code (137)
Docker has killed the container via kill -9
. This is often attributed to OOM issues, and is most often seen when running a memory attack. Allocating more RAM to Docker usually solves the issue.
Non-zero exit code (1)
Unable to find local credentials file: Gremlin is not configured to point to the correct credentials file, usually located in
/var/lib/gremlin
. Ensure the credentials file(s), either certificates of API keys, exists andGremlin
has read+write access.Permission denied (os error 13): The Gremlin container does not have proper filesystem permissions. Gremlin requires write access to
/var/lib/gremlin
, including the ability to create new files. Check permission on the host, and ensure write access is being passed through via Docker when running the Gremlin container.
OS Error 1
This is often observed in the context of Capabilities: Unable to inherit one or more required capabilities: cap_net_admin, cap_net_raw
Solution: Add the missing required Linux capabilities to that Docker container.
Example: docker run -it --cap-add=NET_ADMIN --cap-add=KILL --cap-add=SYS_TIME gremlin/gremlin syscheck
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article