3 tips for reducing stress during incident response efforts
Panic takes time and energy away from swift incident response, leading to second-guessing, a higher likelihood of mistakes, and analysis paralysis. Here are three tips to minimize it.
By Malcolm Preston on 1/18/2023
When a fire breaks out in your city, the fire department is ready to move into action. They don’t hesitate; they know fires are inevitable, and they’ve already prepared for such an emergency. There is no time for panic in their workflow.
Similarly, when an incident inevitably occurs, panic or stress is counter-productive. Panic takes time and energy away from swift incident response, leading to second-guessing, a higher likelihood of mistakes, and analysis paralysis. Worse, stress during response efforts doesn’t just affect individuals, it can permeate between people. One person freaking out can unsettle an entire team.
Of course, “don’t panic” is easier said than done. This is why preparation ahead of time is so important — it puts incident responders in a better head space during incidents and reduces panic responses, which improves speed and efficiency during the incident.
Here are three things you can do to help you and your responding teams keep a cool head even during a major incident.
1. Have a plan
For starters, responding team members will feel more prepared if they know there is a concrete incident response plan. Defining roles and responsibilities ahead of time helps your team focus on the incident and not peripheral details. And knowing there’s a plan gives teams the psychological safety to accept and understand that incidents are inevitable and not worst-case scenarios.
Though each team’s incident response plan may look different, there are some aspects that should be part of every plan:
Assigned roles: Who is responsible for leading the incident and filling any other crucial roles?
Communication guidelines: Who should receive information and updates within the team and throughout the company?
Documentation: Where can team members find the documentation they need?
Resources: Who can team members turn to if they lack the technical expertise to resolve the incident, or need to escalate issues?
Ensure each of these components are well documented so responders aren't scrambling to find what they need during an incident. Some folks use Google docs or an internal wiki to do this, others use an incident response tool as their source of truth.
2. Practice ahead of time
One of the biggest stressors for engineers is when they don’t fully understand the systems they’re working with and the touch points within them. Once you have your response process defined, build your team’s confidence by boosting their operational knowledge and giving them space to make mistakes in a safe environment. An impactful way to do that is by running exercises or drills well ahead of incidents. Practice breaking things in a non-production environment, for example, to give your team the chance to better understand how to fix them. This helps build their confidence and familiarity with your incident response process when something breaks for real.
For example, CircleCI uses a tutorial runbook that maps to production (without the paging), and new engineers run “game days” as part of their onboarding. Having a good practice framework for an engineer who is only three months in — and who may feel some anxiety around declaring an incident — is a huge benefit, they told us. Practice drills can also help shift your team’s attitude toward perceived failure. People often feel ashamed to make mistakes, but breaking things is a key part of understanding how they work. Shift the paradigm so people feel empowered to break and fix things in development environments.
3. Focus on mitigation
You’ve practiced. You have a plan to be intentional and focused. What’s next? Mitigation.
During an incident, the most important goal is mitigation — the team should focus on decreasing customer impact, not on long-term resolution or root-cause analysis. Once systems are running, you can do a deeper investigation with less stress.
Managers can help facilitate this by shielding their team from any external pressures coming from other parts of the company. Teams should be focused solely on mitigation, while the manager absorbs commentary and feedback from stakeholders and obtains any external resources or help the team needs.
Side note: When you do get to analyzing what went wrong, it’s important to remember that any investigation should center around decreasing the risk of future incidents, not assigning blame. People will always be stressed if they're worried about job security, and, with many complex overlapping systems, it’s often impossible to say an incident is a single person’s fault. Assume good intent and trust your team.
Incidents are going to happen — be prepared
The fire department can’t spend all their time focused on preventing every fire. Sure, smoke detectors and fire extinguishers are important, but it’s equally important to be ready when a major fire does inevitably break out.
Proper preparation via practice, planning, and ensuring your team understands how to approach an incident in the heat of the moment will lead to more efficient incident response and a much calmer, happier team. Still not convinced? Here’s what it looks like when these systems aren’t put in place.
See FireHydrant in action
See how service catalog, incident management, and incident communications come together in a live demo.Get a demo