Hot Take: Don't provide incident resolution estimates
Providing incident resolution times to customers is an unneeded stress for responders with very little gain.
By Robert Ross on 9/24/2024
During an incident, your customers want information as soon as possible. They want to know what's happening, why, and what they should do – even if that answer is "sit tight." But today, standing on the platform waiting for the subway to take me into Manhattan, the MTA reminded me of a belief I've had for quite some time: never give time estimates for incidents.
For the uninitiated who have yet to experience an NYC morning commute, descending into your train station in NYC only to find what seems like all of Brooklyn standing there is considered normal. Frustrating, sure, but every day. Today was no different - I didn't huff or puff, I didn't seek alternatives, and I didn't stress over making it to the gym on time.
As (definitely) all of Brooklyn is standing there, an unscripted voice comes over the speakers, informing us that the next available train will be here in five minutes. That's plenty of time for me to make it.
10 minutes later…
A train rolls into the station – but something is off. It's empty. It’s not stopping. The entire platform simultaneously realizes this train is not for us, and the moans are enough to make Bedford Avenue shake like a 5.5 on the Richter scale. Logic kicks in, we assume this is the train they need to decommission, and the next one will be ours.
10 minutes later…
Our train slithers its way into Bedford station to a (very vocally) aggrieved population of Brooklynites.
I didn't make it to the gym. Instead, I used my newfound time to write this blog post.
Incident Communication
People accept that things break—including software. When software does break, a best practice is for the service to notify users of incidents that impact them with a status page update, email notifications, or direct communication via an account owner (like a customer success manager).
Which is what the MTA did - it notified us that something was indeed broken and they were addressing it. Their critical error was telling us when to expect the next train. So what did everyone do? We waited… and then waited some more.
Giving a time estimate to customers means you've created two problems for yourself:
Your customers now expect that you will meet that estimate.
Your team now has to fulfill that expectation – which adds cognitive overhead.
Let’s dig into these.
Problem #1: Customer expectations
Customers will wait for an estimated time (within reason). However, the second you go past that estimate, you've drilled a hole into the bucket that contains your customers' trust. Posting to your status page "We're reverting a bad deploy that caused the outage, ETA to fix is five minutes", you should assume that your customers just set their watches for five minutes.
Giving concrete time estimates means customers may not seek an alternative action – even if doing so would benefit them. Back to my morning commute, I didn't leave the station this morning and take a bike to the gym because five minutes was enough time to make it on the train. When five minutes became 20, the backup bike ride wasn't an alternative for me and I was stuck feeling resentful of my wasted time.
Problem #2: Incident responder stress
People responding to the incident now have a new problem in addition to the incident they're addressing: more cognitive overhead. No incident responder wants to disappoint the customers so by giving a customer a time estimate for a fix, the incident responder has also now set their watch.
Incident responders should have as much headspace as possible to respond to and mitigate incidents. Anything extra, like keeping track of the estimated fix time given to customers, will only prolong the fix or make the situation worse.
Alternatives
It is expected to want to give an estimate to a customer – and they almost always ask for one. At that point, the best thing you can do is be empathic and transparent. Regarding the L train debacle this morning, I would have preferred a message explaining that the doors were broken on the incoming train, that they had to fix it, and that they didn't know how long it would take. A brief message saying, "We know you ride the L train to get to work, and we're doing our best to get you on your way," would have gone a long way.
Notice I'm not saying that you shouldn't estimate things at all – my stance is that you should avoid exposing those estimates to customers. Knowing how long something will take, and discussing that estimate internally, means your team can quickly plan a new action if an estimate is missed (e.g., rolling back a deployment instead of rolling forward). But the risk/reward of telling a customer an estimated time to fix is almost all pain, no gain. If you hit it successfully, it’s expected. If you miss it, it’s massively detrimental.
So please, do your customers (and incident responders) a favor and never estimate when an incident will be mitigated. As for me, I’m going to just leave the station next time.
See FireHydrant in action
See how our end-to-end incident management platform can help your team respond to incidents faster and more effectively.
Get a demo