Dashboard
Reliability Management

Services and Dependencies

A service is a discrete unit of functionality provided by one or more systems in your environment. For example, a web server deployed as a load balancer for your backend systems is a service. In Gremlin, services are the units used to test and measure the reliability of your system. This page will show you how to add, manage, and test your services using the Gremlin web app.

In order to use services, the Gremlin agent must be configured to collect process data. See Enabling Process Collection for more information.

Viewing your list of services

You can access your list of services using the Services menu item in the nav bar. This is the main view of any services that you or your teammates have added to Gremlin, along with their reliability score. This is also called the Service Catalog. If no services have been added yet, this list will appear empty.

To open a service, simply click on its entry in the list. You can search for a specific service by name using the search box, or by clicking on the Name or Score column header to sort by those fields.

Viewing a list of services in the Gremlin web app

Viewing your production services

Gremlin lets you flag services as being in a Production environment. When this flag is enabled for one or more services, those services will be highlighted, and an additional tab named Production will appear at the top of the list. Clicking on the Production tab shows only those services that are flagged as Production so that you can more easily identify key services.

Adding a service

To add a new service, you may define them manually or in your Kubernetes Spec by using an annotation.

Adding a service manually

To add a new service, click the + Service button on the top-right corner of the services list. This will walk you through a short wizard with the following steps:

  1. Give your service a name and define the type of service. Gremlin supports host-based, container-based, and Kubernetes-based services.
  2. Define your service's fingerprint. This is where you select the resources in your environment that comprise your service. The selection will change depending on the type of service selected in step 1. For example, selecting Kubernetes will show all of the Kubernetes resources detected by the Gremlin agent.
  3. Note that you can select multiple resources. For example, you can select multiple Kubernetes Deployments, a Deployment and a DaemonSet, etc.
  4. Select the process you want to use for dependency discovery. Gremlin will use this process' network traffic data to detect dependencies and generate reliability tests for each one.
  5. Note that if only one process is detected, it will be selected by default.
  6. Click Create Service.

Next, you will need to add a health check.

Adding a service by using an annotation

Kubernetes users can register a service with Gremlin by annotating Kubernetes objects with the <span class="code-class-custom">gremlin.com/service-id</span> annotation. Annotations are key-value pairs that provide additional metadata for objects. In this case, <span class="code-class-custom">gremlin.com/service-id</span> is the key, and the value is the name that you want the service to have in Gremlin.

Warning
Using the same service IDs across multiple Gremlin teams is unsupported. If you wish to annotate multiple services with the same service ID, consider adding a team-specific identifier to the ID, such as the team name.
  1. Identify the Kubernetes object you want to annotate. It can be a Deployment, Pod, Service, or any other resource.
  2. Choose a name for the service. This is the name that the service will have in the Gremlin web app and REST API. This can be the same name as the Kubernetes object, or a unique name specifically for Gremlin.
  3. Add the annotation to the object definition. Here's an example of annotating a Deployment named <span class="code-class-custom">my-deployment</span>, which will appear in Gremlin as my-nginx-service:
YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: nginx
  annotations:
    gremlin.com/service-id: my-nginx-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.14.2
          ports:
            - containerPort: 80
  1. Optional: If you want multiple objects to be part of the same service (e.g. two or more Deployments), use the same service name for both and Gremlin will consider them as part of the same service.
  2. Optional: If you want to create this service for another team in your company, you can annotate the spec with gremlin.com/team-id. The value of this annotation should be the teamId of the team within your company that you want to create the service for. Remember to share access to the namespace with that team before doing so, otherwise the service will not be automatically generated.
YAML

annotations:
  gremlin.com/service-id: my-nginx-service
  gremlin.com/team-id: other-team-id
  1. Save and apply the updated manifest(s). Gremlin will detect the annotation and add the new service(s), which you can manage using the Service Catalog.

Viewing service details

The service details page is your dashboard to managing and testing each service. You can perform tasks such as viewing the service's reliability score, running reliability tests, adding Health Checks, adding other integrations, deleting the service, and viewing the service's selection criteria (e.g. the systems in your environment that comprise the service). You can also view, manage, and run tests on the service's dependencies.

A detailed overview of a service in the Gremlin web app

Adding and removing Health Checks

The Health Check feature automatically checks external metrics or REST API endpoints while a reliability test is running. These are usually monitors configured in an observability tool like Datadog, New Relic, or Prometheus. It can also include custom monitoring tools and URLs.

Before you can run a reliability test on a service, you'll need to assign at least one Health Check to the service. While a reliability test is running, your Health Check(s) will poll your observability tool every 10 seconds. If the monitor/endpoint reports back as failed, unhealthy, or unavailable, the Health Check will halt the ongoing test, revert the impact, and mark it as failed.

Adding a Health Check to a Service

Once you've created a Health Check, you can add it to a service in one of two ways:

Adding a Health Check from the service details page

    1. In the Gremlin web app, open the details page for the serivce you want to add the Health Check to.
    2. Click Settings at the top of the page next to the service name, then select Health Checks.
    3. Click on the Health Checks drop-down and select the Health Check you want to add. You can also type text to search for a specific Health Check.
    4. Click + Add to add the Health Check to the service.

    Adding a Health Check from the Service Catalog

      1. In the Gremlin web app, open the service catalog.
      2. Click the check box next to the service you want to add the Health Check too. Note that you can select multiple services for bulk adding Health Checks.
      3. Click on the Health Checks drop-down at the top of the list and select the Health Check you want to add. You can also type text to search for a specific Health Check.
      4. Click + Add to add the Health Check to the service.

      Editing a Health Check

      You can edit an existing Health Check in one of two ways:

      Editing a Health Check from the service details page

      1. In the Gremlin web app, open the details page for the serivce you want to add the Health Check to.
      2. Click Settings at the top of the page next to the service name, then select Health Checks.
      3. Find the Health Check you want to edit, then click Edit.
      4. Make the desired changes to the Health Check.
      5. Click Test Connection, then Test Evaluation to verify the new settings.
      6. Click Save Health Check to save the changes.

      Editing a Health Check from the Health Checks page

      1. In the Gremlin web app, open the Health Checks page.
      2. Find the Health Check you want to edit, then click Edit.
      3. Make the desired changes to the Health Check.
      4. Click Test Connection, then Test Evaluation to verify the new settings.
      5. Click Save Health Check to save the changes.
      Edit Datadog Health Check example
      Removing a Health Check

      To remove a Health Check from a Service, open the Service in the Gremlin web app, click Settings, and then click the Health Checks tab. Find the Health Check you want to edit, then click Edit. This will delete this specific Health Check, but it will not delete the authentication settings for the observability tool. In other words, you can continue using this observability tool for other Health Checks.

      Note
      Gremlin will not let you delete Health Checks that are actively in use by at least one service. You'll need to make sure a Health Check is not in use before deleting it.

      Viewing the reliability score

      Each service has a reliability score ranging from 0 to 100. This score is a calculated value that represents how reliable the service is. Running a reliability test will increase your score. To learn how the score is calculated, see Reliability Score.

      Editing service settings

      You can modify a service by clicking the Settings button at the top of the service's page. This page lets you:

      Note on service deletion
      Deleting a service will also delete its score, test history, and dependencies. This is irreversible!

      Flagging a service as Production

      If a service is running in production, you may want to avoid running tests on it without confirmation. Gremlin lets you do this by tagging the service with metadata that identifies it as running in a production environment. The service will be highlighted in the service list and will show a confirmation warning when you try to run tests.

      To flag a service as "Production":

      1. Open the service settings and select the Tags tab.
      2. In the Tag Name box, enter <span class="code-class-custom">environment</span>, and in the Tag Value box, enter <span class="code-class-custom">production</span>.
      3. Click Add Tag. The new tag will appear in the box below, and "production" will be highlighted in orange.
      Marking a service as Production in the Gremlin web app

      If you want to flag every service as a Production service, you can do so by navigating to Team Settings, selecting the Environments tab, and clicking the Everything in this Team is in a Production environment checkbox.

      Tagging a service with custom metadata

      In addition to auto-detected tags (region, zone, etc.), you can also add your own custom tags to services. This lets you add metadata to help with searching, grouping, and filtering services.

      To add a tag to a service, open the service's settings page and select the Tags tab. Give the tag a name in the Tag Name box, and enter its value in the Tag Values box. You can store multiple values in a single tag by entering them as a comma-separated list. When you're ready to add the tag, click Add Tag, then click Save.

      The tag table shows all tags associated with this service. You can remove a tag from the service by clicking the Delete button. Note that deleting or editing a tag here won't change other service's tags, even if they share the same tag name.

      Managing dependencies

      In addition to testing a service, Gremlin can also test each service's dependencies. Gremlin will try to auto-detect all relevant dependencies using the service's network traffic, and will automatically attach them to a service.

      Example of dependencies Gremlin can discover:

      • Common technologies using well-known ports (e.g. <span class="code-class-custom">Oracle</span> over <span class="code-class-custom">1521</span>)
      • AWS, Azure, and Datadog Cloud services (e.g. <span class="code-class-custom">DynamoDB</span>)

      If gremlin missed a dependency, you can manually define the dependency using the suggested or manual dependencies workflows below.

      Adding dependencies

      To add a dependency, open your service page in the Gremlin web app, scroll down to Dependencies, and click the Add Dependency button. You can choose whether to add a Suggested dependency or a Manual dependency:

      • Suggested dependencies are dependencies that Gremlin detected, but wasn't confident enough to include automatically. This can be due to low traffic or few connections between the service and this dependency.
      • Manual dependencies are dependencies that you define yourself.
      Note
      The web app will display a small banner in the Dependencies section if you have suggested dependencies.

      Adding suggested dependencies

      To add a suggested dependency, click on the Suggested tab and follow these steps:

      1. Select whether to list dependencies by IP address or port number by using the Type dropdown list.
      2. Select a dependency from the table. Selecting a dependency shows its corresponding IP address or port number, as well as the number of connections between the service and the dependency.
      3. You can click on the column headings to sort, filter, show, or hide individual columns.
      4. Enter a name for the dependency.
      5. Optionally, make any edits to the dependency identifier or port number if needed.

      Click Save Dependency to add the dependency.

      Adding manual dependencies

      To add a manual dependency, click on the Manual tab and follow these steps:

      1. Enter a name for the dependency.
      2. Enter the dependency's network identifier. This can be a hostname, IP address, CIDR subnet, URL, or cloud service.
      3. Optionally, enter the port(s) to target. You can enter a single port number, a port range, or a comma-separated (CSV) string of multiple ports and/or port ranges. Leaving this blank will target all ports.

      Click Save Dependency to add the dependency.

      Editing dependencies

      To edit a dependency, click the pencil icon under the Actions column. After making your edits, click Save to save your changes.

      Removing Dependencies

      To remove a dependency, click the delete icon under the Actions column. A confirmation modal window will appear. Click Delete again to confirm the deletion.

      Marking a dependency as a Single Point of Failure (SPOF)

      Some dependencies are known to be a risk. These dependencies may be critical parts of your infrastructure, where a failure of the dependency will result in an outage. These are known as single points of failure (SPOF).

      Flagging a dependency as a single point of failure excludes it from running reliability tests when a user clicks the Run All Tests button, or sets up Auto Scheduling. You can still run tests on the dependency manually, and the results will still contribute to the service's reliability score. This feature is simply meant to prevent users from accidentally running automated tests on the dependency when it's already known to be a risk. Once the risk is addressed, you can simply uncheck this feature to include the dependency in automatic testing once again.

      Viewing a dependency in the Company Report with a single SPOF
      Hovering over the SPOF count in a Company Report, showing an explanation of what an SPOF is
      Marking an existing dependency as a single point of failure

      To mark an existing dependency as a single point of failure:

      1. Open the Services list in the Gremlin web app, then click on the service containing the dependency you want to flag.
      2. Scroll down to the Dependencies section of the service's overview page, then click Dependencies to view its dependencies.
      3. Click on the gear icon next to the name of the dependency you want to flag, then click Edit. This opens the dependency's settings.
      4. In the Edit Dependency pane, check the option Mark this Dependency as a Single Point of Failure, then click Save.
      Screenshot of the SPOF checkbox in the Edit Dependency screen
      Marking a new dependency as a single point of failure

      To mark a new dependency as a single point of failure:

      1. Follow the instructions in Adding dependencies, but don't click Save yet.
      2. Check the option Mark this Dependency as a Single Point of Failure.
      3. Click Save.
      Screenshot of adding a new suggested dependency using IP address and port number

      FAQ

      Q: How often are services discovered?

      A: Gremlin currently discovers services once every hour.

      Q: How often are characteristics of an existing service discovered and/or modified?

      A: Gremlin currently discovers and/or modifies once every hour.

      Q: How often are targets resolved to an existing service?

      A: Gremlin resolves targets instantly, as soon as they change on a service. If a new pod is registered with the control plane, it’s immediately registered as a target to a service.

      Q: How often does Gremlin associate pods, containers and hosts with existing services?

      A: Every 30 seconds.


      No items found.
      Previous
      Next
      Previous
      This is some text inside of a div block.
      Compatibility
      Installing the Gremlin Agent
      Authenticating the Gremlin Agent
      Configuring the Gremlin Agent
      Managing the Gremlin Agent
      User Management
      Integrations
      Health Checks
      Notifications
      Command Line Interface
      Updating Gremlin
      Quick Start Guide
      Services and Dependencies
      Detected Risks
      Reliability Tests
      Reliability Score
      Targets
      Experiments
      Scenarios
      GameDays
      Overview
      Deploying Failure Flags on AWS Lambda
      Deploying Failure Flags on AWS ECS
      Deploying Failure Flags on Kubernetes
      Classes, methods, & attributes
      API Keys
      Examples
      Container security
      General
      Linux
      Windows
      Chao
      Helm
      Glossary
      Additional Configuration for Helm
      Amazon CloudWatch Health Check
      AppDynamics Health Check
      Blackhole Experiment
      CPU Experiment
      Certificate Expiry
      Custom Health Check
      Custom Load Generator
      DNS Experiment
      Datadog Health Check
      Disk Experiment
      Dynatrace Health Check
      Grafana Cloud Health Check
      Grafana Cloud K6
      IO Experiment
      Install Gremlin on Kubernetes manually
      Install Gremlin on OpenShift 4
      Installing Gremlin on AWS - Configuring your VPC
      Installing Gremlin on Kubernetes with Helm
      Installing Gremlin on Windows
      Installing Gremlin on a virtual machine
      Installing the Failure Flags SDK
      Jira
      Latency Experiment
      Memory Experiment
      Network Tags
      New Relic Health Check
      Overview
      Overview
      Overview
      Overview
      Overview
      Packet Loss Attack
      PagerDuty Health Check
      Preview: Gremlin in Kubernetes Restricted Networks
      Private Network Integration Agent
      Process Collection
      Process Killer Experiment
      Prometheus Health Check
      Role Based Access Control
      Running Failure Flags experiments
      Scheduling Scenarios
      Shared Scenarios
      Shutdown Experiment
      Slack
      Teams
      Time Travel Experiment
      Troubleshooting Gremlin on OpenShift
      User Authentication via SAML and Okta
      Users
      Webhooks
      Integration Agent for Linux
      Test Suites
      Restricting Testing Times
      Reports
      Process Exhaustion Experiment
      Enabling DNS collection