Create a service catalog that grows with you

When your incident response process is centered around a service catalog, responders are able to more quickly pinpoint the service or functionality that’s down, bring in the team or experts, and then get to solving the problem faster. Saving even a few minutes can have a big impact on decreasing the costs around incidents and outages, so having up-to-date service details at your fingertips can make all the difference.

In fact, the Incident Benchmark Report, an analysis of more than 50,000 incidents resolved on the FireHydrant platform, found that incidents with services attached to them had a 36% decrease in MTTR (mean time to resolve) compared to those with no services attached.

Of course, you don’t have to conquer all things “service catalog” at once. There are incremental steps you can take to implement and then mature your approach to service-based incident response. In this blog post, we’ll talk about three approaches you can take based on your program’s maturity level.

Get started with a service catalog#get-started-with-a-service-catalog

At its most basic, a service catalog is simply a list of internal and external technical services (enterprise applications, task-specific tools, microservices, APIs, and so on) used by your organization, and relevant details like owner, code location, and operational dashboards. By documenting this information, you help knock down knowledge silos and ensure everyone has the information they need to respond to incidents confidently — a big deal when you’ve just been paged at 1 a.m.

Even if you haven’t written anything down yet, you probably have a framework of service dependencies living in the heads of your team. Moving this service graph into your service catalog will further help you determine who should be in the room when things go down.

If you’re building a service catalog from scratch, though, start simple. Start by listing all the services with their owning and responding teams, contact details, repositories, documentation, and monitoring dashboards. If you’re managing a monolith instead of microservices, you can still use a service catalog. Break down any monoliths by module, components, or product surface area. Each product area should have an engineering team associated with it, and those teams should be trained on your incident response process.

Once you’ve got the simple service details and dependencies out of everyone’s head, you can start to add valuable layers to your catalog. Add functionalities, like login or checkout, on top of the services that power them. Why by functionality? Because that’s how your customers think. They’re not concerned with what service is broken, they’re concerned that they can’t log in. This has the added benefit that more people in your organization can be involved in incidents without knowing the technical details of your system.

You don’t necessarily need a dedicated tool if you’re just starting out — you can store your service catalog in a company wiki or a shared document that you can reference. However, if you’re new to documenting a formal incident response process altogether, taking the time at the beginning to center it around your services will deliver better results when you start to mature your program. So consider setting up your process so that when an incident is declared and you find out what’s broken, it’s clear which team member needs to be alerted (and maybe that’s even done automatically).

Take your service catalog to the next level#take-your-service-catalog-to-the-next-level

When they’re most valuable, service catalogs are treated as more than a directory; they are valuable tools for aggregating institutional knowledge. The ultimate goal is a fully fleshed-out service catalog that includes dependencies, owners, and links to operational documentation. And once you have the basics in place, there are several steps you can take to level up.

Create space for documentation: Users should be able to quickly find out what a service does, who created it, what recently changed, and all other information associated with it. Dedicate a space in your service catalog for documentation of past incidents and events.
Map services to runbooks: Seriously speed things up by connecting your services to runbooks. For example, Avalara mapped a runbook to each service the company monitors, which helps the team get the right people in the right place at the right time faster, as well as document service-specific nuances and processes for the many applications monitored. When a service goes down, the corresponding runbook is triggered, and everyone jumps into action.
Consolidate information: Update your service catalog as your organization grows (there are tools that make this easier) to ensure a streamlined, focused incident response process. Information consolidation ensures that new hires and unfamiliar teams can access the same valuable historical data and context as your most experienced engineers.

Get the most out of your service catalog#get-the-most-out-of-your-service-catalog

Cataloging and tracking changes to all of the services within your system can become increasingly complex as more users and processes are added. Tools like ours or Backstage (which integrates with ours) can make maintaining your service catalog significantly easier. And ultimately, having all of that information interacting with your incident response process is where the big payoff on time savings happens.

Using a dedicated tool can also help you improve your incident management process by:

Automating incident kickoff. When an incident is declared, your tool can use the service catalog to automatically pull in all relevant parties into a shared Slack channel.
Automating record keeping. Automatically add incident reports and historical data to the service catalog, so it’s ready for the next time it’s needed.
Creating production readiness checklists. Evaluate and maintain the production readiness of the services their users rely on every day: spot risks in your service dependencies before they cause incidents, and respond quickly if they do.
Generating user service dependency graphs. This is a visual way to quickly surface dependencies, understand the relationship between services, and determine the scope or impact of an incident.

Give it a go#give-it-a-go

A robust service catalog is an essential tool in the overall incident management ecosystem and can significantly enhance your team’s productivity. It’s not surprising that our Benchmark Report found a 1640% increase in services created throughout 2022. Learn more about what teams are doing to improve their incident management practices in our on-demand webinar, Proving ROI: How to evaluate and improve how you manage incidents.