Modern applications are rarely created entirely from scratch. Instead, they rely on a framework of pre-existing applications and services, each adding specific features and functionality. These dependencies empower teams to build and deploy applications more efficiently, but they bring their own set of challenges. Tracking, managing, and updating these dependencies is difficult, especially in large, complex applications where dependencies are likely managed by different teams.

Dependencies don’t just add overhead, though. They also create reliability risks. If your application requires a dependency to work, and that dependency fails, then your application will also fail. Applications become reliability “minefields,” where a minor incident with an obscure dependency can have cascading effects on the entire application. But before we can find and disarm these mines, we first need to know where they are.

In this blog post, we present a solution for teams to use to identify, track, and map hidden dependencies. We’ll talk about the challenge of detecting dependencies, different methods of approaching dependency management, and talk about how Gremlin automates dependency tracking for you.

Why is dependency discovery and management so difficult?

A dependency is a software component that provides features or functionality to another component. Dependencies reduce the amount of work that developers and operators need to invest in building applications and make it easier for teams to modularize their applications, but tracking these dependencies can become difficult.

“Dependencies” are often used in the context of software libraries, but they also apply to services and applications.

In larger projects—particularly enterprise applications—there can be dozens of dependencies that get added and removed over time. Engineers may eventually lose track of which dependencies are being used, which service(s) they’re supporting, and which teams are managing them. This might not be catastrophic at first, but it will become a problem over time. Imagine if a critical database suddenly failed and took other services offline. If engineers don’t know about this dependency or which services it serves, then troubleshooting and fixing it becomes a complicated process of reading application logs, tracking down the dependency in your environment, and then trying to find and fix the root cause. Now, imagine if this is just one of dozens of other databases, caches, load balancers, serverless providers, and other resources providing critical functionality to your application.

A common approach to solving this is to document these changes in a central knowledge base, but this comes with problems. Engineers are often too busy working on other tasks to update documentation; documentation might be stored in different places or formats so there’s no single source of truth; and engineers may leave the company, taking tribal knowledge with them. Different tools have tried solving this problem, particularly service catalogs and Configuration Management Databases (CMDBs). These tools combine manual service definitions with automated scanning to help engineers map out their environments, but even this has limitations.

At Gremlin, we took a similar approach by using our own Gremlin agent to automatically scan for, collate, and track dependencies in your environment.

How Gremlin discovers and tracks dependencies

Gremlin automatically detects dependencies in your environment, scans them for potential reliability risks, and lets you run pre-built reliability tests on them in one click. To do this, we needed a different way of discovering dependencies.

First, our dependency discovery feature is designed around services. A service is any software running on a host, container, or Kubernetes resource, and can be anything from a website frontend to a database to a logging service. When you add your service to Gremlin, we use its network traffic to find endpoints that it communicates with frequently. We then collate this data into specific endpoints represented by an IP address and port number, then add these to the service as its dependencies.

For example, if a service sends a lot of traffic to and from an IP address over port 3306, Gremlin assumes that this is a dependency of the service. It will list its host name if available using reverse DNS, and for common services like web servers and databases, it may also show the type of service. You can also customize the name shown for each dependency.

Here, we have dependencies discovered for a Kubernetes Deployment. Both of these dependencies are also Kubernetes Deployments and have dependency-specific tests automatically created and ready to run:

A list of dependencies for a Kubernetes service. Each dependency has three tests attached to it in various stages of pass/fail/not run.

All dependencies are listed on the service’s overview page, so you can see what dependencies are available for testing, what the results of their tests are, and how they contribute to the service’s overall reliability:

A dependency shown in a table in the Gremlin web app. Three buttons corresponding to different test are shown next to its name. One test has failed.

Gremlin discovers dependencies by inspecting network traffic between your services and its dependencies. By examining which IP addresses your service talks to, and which DNS entries resolve to those addresses, Gremlin can create an entry for that dependency for your service. This works even in cases where the IP address changes, such as a load-balanced service. The result is an accurate, up-to-date, and human-readable representation of dependencies that are directly connected to the services your engineers care about.

Additionally, Gremlin’s comprehensive reliability testing tools make it simple to measure the reliability of each dependency and check for risks that could lead to outages. The reliability of your dependencies is reflected in the service’s reliability score, which tells you how well your service can withstand various failure scenarios.

The overview for a Gremlin service showing a reliability score of 70%, one detected risk and 6 dependencies.

Security

Implementing strong security measures is a core pillar of our work at Gremlin. This is especially true when handling process and network data. For dependency discovery to work, the Gremlin agents running in your environment poll for DNS queries, then send them to our Control Plane. We protect this information in several ways, including:

  • Using multiple layers of access control.
  • Encrypting data both in transit and at rest.
  • Complying with ISO 27001 & 27017, PCI DSS Level 1, and SOC 1 & 2 & 3
  • Auditing our systems at least quarterly, and in some cases, daily.

A full breakdown of our security practices can be found at gremlin.com/security. We also document all of the permissions and capabilities that our agent requires on our security documentation page.

Conclusion

Dependency management is an ongoing challenge in the tech industry. Our goal is to make dependency tracking as automatic and seamless as possible so that you can focus entirely on testing and improving the reliability of your services. This includes uncovering dependencies you never knew existed, tracking dependencies across your organization, and with features like Detected Risks, letting you know when a dependency poses a reliability risk long before it can impact customers.

If you want to learn more about dependency discovery, or if you have questions you'd like to ask our reliability experts, we'll be hosting a live Office Hours webinar on February 22, 2024. Click here to register for free or watch the on-demand recording.

To see how Gremlin can help you track your dependencies and their reliability risks, sign up for a free trial.

No items found.
Categories
Andre Newman
Andre Newman
Sr. Reliability Specialist
Start your free trial

Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.

sTART YOUR TRIAL