Search documentation
Dashboard
Reliability Management

Services and Dependencies

A service is a discrete unit of functionality provided by one or more systems in your environment. For example, a web server deployed as a load balancer for your backend systems is a service. In Gremlin, services are the units used to test and measure the reliability of your system. This page will show you how to add, manage, and test your services using the Gremlin web app.

Viewing your list of services

You can access your list of services using the Services menu item in the nav bar. This is the main view of any services that you or your teammates have added to Gremlin, along with their reliability score. If no services have been added yet, this list will appear empty.

To open a service, simply click on its entry in the list. You can search for a specific service by name using the search box, or use the "Sort by" box to sort the list by name, reliability score, or last modified date.

Viewing a list of services in the Gremlin web app

Viewing your production services

Gremlin lets you flag services as being in a Production environment. When this flag is enabled for one or more services, those services will be highlighted, and an additional tab named Production will appear at the top of the list. Clicking on the Production tab shows only those services that are flagged as Production so that you can more easily identify key services.

Adding a service

To add a new service, click the + Service button on the top-right corner of the services list. This will walk you through a short wizard with the following steps:

  • Give your service a name and define the type of service. Gremlin supports host-based, container-based, and Kubernetes-based services.
  • Define your service's fingerprint. This is where you select the resources in your environment that comprise your service. The selection will change depending on the type of service selected in step 1. For example, selecting Kubernetes will show all of the Kubernetes resources detected by the Gremlin agent.
    • Note that you can select multiple resources. For example, you can select multiple Kubernetes Deployments, a Deployment and a DaemonSet, etc.
  • Select the process you want to use for dependency discovery. Gremlin will use this process' network traffic data to detect dependencies and generate reliability tests for each one.
    • Note that if only one process is detected, it will be selected by default.
  • Click Create Service.

Next, you will need to add a health check.

Viewing service details

The service details page is your dashboard to managing and testing each service. You can perform tasks such as viewing the service's reliability score, running reliability tests, adding Health Checks, adding other integrations, deleting the service, and viewing the service's selection criteria (e.g. the systems in your environment that comprise the service). You can also view, manage, and run tests on the service's dependencies.

A detailed overview of a service in the Gremlin web app

Adding and removing Health Checks

The Health Check feature automatically checks external metrics or REST API endpoints while a reliability test is running. These are usually monitors configured in an observability tool like Datadog, New Relic, or Prometheus. It can also include custom monitoring tools and URLs.

Before you can run a reliability test on a service, you'll need to assign at least one Health Check to the service. While a reliability test is running, your Health Check(s) will poll your observability tool every 10 seconds. If the monitor/endpoint reports back as failed, unhealthy, or unavailable, the Health Check will halt the ongoing test, revert the impact, and mark it as failed.

Adding a Health Check to a Service

Once you've created a Health Check, you can add it by:

  • In the Gremlin web app, open the service that you want to add the Health Check to.
  • Click Settings at the top of the service overview page, then click Health Checks.
  • Select the Health Check you want to add from the drop-down list, then click + Add.
  • If necessary, fill out the required fields, then click Save.
Editing a Health Check

To edit a Health Check that you've already added to a Service, open the Service in the Gremlin web app, click Settings, and then click the Health Checks tab. Find the Health Check you want to edit, then click Edit.

Edit Datadog Health Check example

Alternatively, you can click on Health Checks in the left-hand navigation menu, find the Health Check you want to edit, then click Edit.

Removing a Health Check

To remove a Health Check from a Service, open the Service in the Gremlin web app, click Settings, and then click the Health Checks tab. Find the Health Check you want to edit, then click Edit. This will delete this specific Health Check, but it will not delete the authentication settings for the observability tool. In other words, any other Health Checks using the same tool will continue functioning.

Viewing the reliability score

Each service has a reliability score ranging from 0 to 100. This score is a calculated value that represents how reliable the service is. Running a reliability test will increase your score. To learn how the score is calculated, see Reliability Score.

Editing service settings

You can modify a service by clicking the Settings button at the top of the service's page. This page lets you:

Flagging a service as Production

If a service is running in a production environment, you can flag it as such. Any services flagged as "Production" are highlighted in the service list and show a unique warning when you go to run tests. To flag a service as "Production", open the service's settings, select the Environment tab, then check the Production check box.

Marking a service as Production in the Gremlin web app

If you want to flag every service as a Production service, you can do so by navigating to Team Settings, selecting the Environments tab, and clicking the Everything in this Team is in a Production environment checkbox.

Managing dependencies

In addition to testing a service, Gremlin can also test each service's dependencies. Gremlin will try to auto-detect all relevant dependencies using the service's network traffic, and will automatically attach them to a service.

Example of dependencies Gremlin can discover:

  • Common technologies using well-known ports (e.g. Oracle over 1521)
  • AWS, Azure, and Datadog Cloud services (e.g. DynamoDB)

If gremlin missed a dependency, you can manually define the dependency using the suggested or manual dependencies workflows below.

Adding dependencies

To add a dependency, open your service page in the Gremlin web app, scroll down to Dependencies, and click the Add Dependency button. You can choose whether to add a Suggested dependency or a Manual dependency:

  • Suggested dependencies are dependencies that Gremlin detected, but wasn't confident enough to include automatically. This can be due to low traffic or few connections between the service and this dependency.
  • Manual dependencies are dependencies that you define yourself.
Adding suggested dependencies

To add a suggested dependency, click on the Suggested tab and follow these steps:

  • Select whether to list dependencies by IP address or port number by using the Type dropdown list.
  • Select a dependency from the table. Selecting a dependency shows its corresponding IP address or port number, as well as the number of connections between the service and the dependency.
    • You can click on the column headings to sort, filter, show, or hide individual columns.
  • Enter a name for the dependency.
  • Optionally, make any edits to the dependency identifier or port number if needed.

Click Save Dependency to add the dependency.

Adding manual dependencies

To add a manual dependency, click on the Manual tab and follow these steps:

  • Enter a name for the dependency.
  • Enter the dependency's network identifier. This can be a hostname, IP address, CIDR subnet, URL, or cloud service.
  • Optionally, enter the port(s) to target. You can enter a single port number, a port range, or a comma-separated (CSV) string of multiple ports and/or port ranges. Leaving this blank will target all ports.

Click Save Dependency to add the dependency.

Editing dependencies

To edit a dependency, click the pencil icon under the Actions column. After making your edits, click Save to save your changes.

Removing Dependencies

To remove a dependency, click the delete icon under the Actions column. A confirmation modal window will appear. Click Delete again to confirm the deletion.

FAQ

Q: How often are services discovered?

A: Gremlin currently discovers services once every hour.

Q: How often are characteristics of an existing service discovered and/or modified?

A: Gremlin currently discovers and/or modifies once every hour.

Q: How often are targets resolved to an existing service?

A: Gremlin resolves targets instantly, as soon as they change on a service. If a new pod is registered with the control plane, itโ€™s immediately registered as a target to a service.

Q: How often does Gremlin associate pods, containers and hosts with existing services?

A: Every 30 seconds.