Incident resolution is the final step in the incident lifecycle. In this phase, not only do customers no longer feel the effects of an incident, but engineers have implemented a long term, sustainable solution.

What is the difference between resolution and mitigation? 

At FireHydrant, we say an incident is mitigated when customers no longer feel its impact but resolved when a durable resolution is in place. Sometimes, the action taken to mitigate an incident is durable enough for it to also be considered resolved. 

Tracking the time it takes to move between the mitigation and resolution phases can provide valuable insight into tech debt lurking in your systems, which will ideally help drive its prioritization for future work. That’s part of why scheduling a retrospective following the incident is so important. 

How do I reach incident resolution?

In some cases, the action(s) taken to mitigate an incident are durable enough for the incident to be considered resolved. If not though, continue working until you reach resolution. 

  • Audit the solution for durability. Potentially non-durable mitigation strategies include:

  • Cronjobs to restart pods every 15 minutes

  • Turning off caching because the caching layer is failing

  • Manual changes made to infrastructure that are not in the infrastructure-as-code repository

  • Code hotfixes that did not have test coverage added

  • Create tickets and describe what work needs to be done to resolve the incident.

  • Decide upon the implementation timeline, which can be especially important if you’re dealing with an after-hours incident. Consider whether:

    • The mitigation is durable enough to persist until the next working day.

    • You’ll be easily notified by automated alerting if the mitigation somehow breaks. 

    • Implementing a durable solution immediately creates unnecessary risks that could trigger another incident.

  • Mark the incident as resolved when all tickets to implement a durable solution are complete.

See FireHydrant in action

See how service catalog, incident management, and incident communications come together in a live demo.

Get a demo