Mitigation

Incident mitigation means taking temporary corrective measures to minimize the impact of an issue as the responders continue to work on a more permanent fix.

What is incident mitigation?#what-is-incident-mitigation

Mitigation means temporarily fixing an affected system, making it usable even though responders are still monitoring the situation or working on more sustainable fixes behind the scenes. In comparison, remediation means the responders have fully resolved the issue, and the system is working normally again.

Why is mitigation important?#why-is-mitigation-important

When an incident occurs, the first critical goal is mitigation. The team should focus on decreasing customer impact — not long-term resolution or root-cause analysis. Once systems are running, responders can do a deeper investigation with less stress.

Mitigation means a system can go back online before the responders fully implement a final solution. It minimizes the time it takes for a system to return to a usable state for customers, leading to a better user experience and less noticeable downtime.

Best practices for incident mitigation#best-practices-for-incident-mitigation

Incident mitigation is a high-stress process. After all, mitigating an issue is comparable to putting a bandage on an actively bleeding wound. The response team likely feels immense pressure from customers and internal stakeholders and must act quickly.

So to mitigate an incident as efficiently as possible, responders must practice minimizing stress and maximizing productivity. Here are a few best practices for lowering stress during incident mitigation:

Create an incident management plan ahead of time#create-an-incident-management-plan-ahead-of-time

If you put a plan in place during a non-stressful time, mitigating the incident will be much easier when your team is feeling the pressure. This plan should include:

Assigned roles: Who is responsible for leading the incident and filling other crucial roles? It’s essential to assign an incident commander ahead of time.
Communication guidelines: Who should receive information and updates within the team and throughout the company?
Documentation: Where can team members find the documentation they need?
Resources: Who can team members turn to if they lack the technical expertise to resolve the incident or need to escalate issues?

Automate the predictable steps#automate-the-predictable-steps

There are a few predictable steps that must happen during every mitigation process. By automating these steps, you put your team in a better place to solve more complex problems and develop a mitigation strategy faster. These steps include:

Incident assembly: making a place for people to communicate (e.g., a Slack channel, Zoom room, etc.) and bringing the right people to that place.
Incident communication: proactively communicating to stakeholders, inside and outside the organization. These communication efforts might occur in a specific channel or on a status page.
Incident documentation: using an automation tool as your “scribe” to log what happened throughout the response process. Then, you won’t have to worry about recording the details yourself.

Shield the responders from external pressure#shield-the-responders-from-external-pressure

The incident manager or commander should focus on shielding their team from external pressure from other parts of the company. Teams should be focused solely on mitigation, while the manager absorbs commentary and feedback from stakeholders and obtains any external resources or help the team needs. By handling external stressors, the incident commander can help the responders stay focused on problem-solving — not getting caught up in others’ thoughts and opinions.