When to hire an Incident Commander
Do you need a dedicated incident commander? We will explore what an incident commander is, what forms the role can take, and when you should consider the addition of a dedicated incident commander to your organization.
By Ryan McDonald on 3/11/2022
What comes to mind when you hear the term 'incident commander'? You are not alone if you think about fancy, tri-cornered hats, well-polished shoes, and a uniform weighed down by medals. The roles of incident commander, incident manager, or technical escalation manager have been typical in large organizations but are gaining popularity in smaller companies. For the purposes of this article, we will use the term 'incident commander,' but any of the above titles could work.
So, the question is, do you need a dedicated incident commander? Do you need a team of incident commanders?! Can you get by with a volunteer incident commander?
In this post, we will explore what an incident commander is, what forms the role can take, and when you should consider the addition of a dedicated incident commander to your organization.
What is an incident commander?
At its core, an incident commander helps facilitate communication and the timely resolution of a software incident while balancing and minimizing risk to responding teams, the business, and customers. Imagine an authoritative project manager in charge of an impromptu, emergent, and maybe chaotic project with business-ending ramifications if it fails. If you are nodding right now: welcome; you've been involved in a large-scale incident before. If you were successful, someone probably stepped into this role, whether they held the title of Incident Commander or not.
Incident Commanders, especially of the dedicated variety, can also take on a variety of other supporting tasks outside of incidents, including but not limited to:
- Documentation and recording of incidents
- Post-incident review facilitation and incident analysis
- Collecting and aggregating data on uptime and other relevant incident metrics
- Sharing learning from incidents
- Incident process improvement across all responder organizations
The incident commander role, formalized or otherwise, is the linchpin of your response processes. Before you run out and hire a team of incident commanders, it's worth considering what organizational and functional conditions benefit most from having staff dedicated to the role.
Does your organization need an incident commander?
When might an organization consider minting an incident commander role? Below is a list of questions for self-assessing if your team is indeed ready for this unique role.
Caveat: These are generalizations to help organizations understand if there is space or need for an incident command role, not absolutes. The world of incidents, by its nature, is ripe with exceptions.
Software and organizational complexity
Complexity in software can drive a greater need for coordination during an incident. Complexity in organizational structures and technical service dependencies quickly increase friction during incident response. Below are some common indicators of complexity that could warrant greater coordination during incidents.
Are your engineering teams split by technical domain? Typically domain-centric teams can require greater coordination if the problem exists between services or layers. Bringing together a group of people that do not frequently work with one another, particularly during a crisis, can often benefit from external facilitation and coordination.
How much of your stack was not written by the team maintaining it? For our purposes, we will assume Legacy software to mean 'software your teams have limited knowledge of.' This type of arrangement radically can increase the difficulty of triage efforts. Leveraging an incident commander to help locate domain experts can ensure responders are focused on triage, not lighting up Slack.
How many external dependencies does your team have? Specifically, does your organization leverage vendors for critical functionality, support multiple products, or have an internal 'platform' team(s) supporting in-house tools? These factors can add levels of complexity to the incident response beyond the technical triage efforts. Multiple products being impacted by an issue, internal or external tooling failures that many teams depend upon can quickly ratchet the levels of internal communication required.
Is your organization rapidly growing or iterating on its incident management processes? Hypergrowth and scaling processes can lead to incidents with new teams in unfamiliar process territory or older teams exposed to new processes for the first time. Having an expert to help guide the response increases consistency and efficiency. It also reduces spent figuring out the 'right way' to manage the incident or, worse yet, missing expectations from customers or internal stakeholders.
Is your organization working towards compliance goals or catering to enterprise customers? A well-documented and auditable incident process is a fantastic complement to the reams of documentation required to pass regulatory and compliance audits. In the case of B2B business, it can aid in clearing large organizations' procurement departments.
What forms can an incident commander take?
Incident commanders can add substantial value to an organization, but sometimes the idea of dedicating an entire headcount may not make sense given an organization's scale, maturity or complexity. Thankfully, incident command can scale down to nearly any organizational size. The spectrum of possible incident commanders bucketed broadly from voluntary to a dedicated role is outlined below.
A critical factor in a volunteer incident commander model is to ensure volunteers are sufficiently recognized, valued, and potentially even incentivized. Without support, this model will be short-lived.
Like many informal roles in growing organizations, your org may already have a de facto incident commander hiding in the ranks. This person will often live in support, operations, or an engineering role, equipped with greater-than-average customer empathy. If they have been around for long enough, they may also have technical instincts honed by repeated exposure to your organization's software on its worst days.
- Pro: The quality of the output from this ambitious volunteer can rival that of a full-time incident commander when well resourced.
- Pro: Once identified, this person is a great candidate to help scale out a more intentionally developed group of volunteers.
- Con: This person can be at risk of burnout and inevitably be a single failure point.
- Con: In a large enough organization, it is unlikely that a single person can drive significant change to the incident process.
A group of volunteers with structured norms similar to a guild informally manage the 'incident program,' often with the help and guidance of an executive sponsor. Day jobs of these individuals are usually less critical than one might think, aside from having the flexibility to drop out of their everyday work to help manage an incident.
- Pro: Similar to above, volunteers are often passionate and can provide a high-quality commander experience for an organization.
- Pro: A more distributed group can help shoulder the load and spread better practices more broadly than an individual
- Pro: Greater resilience to attrition or similar single points of failure with an individual.
- Con: Without executive sponsorship or other incentives, it can be challenging to maintain the inertia of this group.
- Con: It can be challenging to drive process change if the group doesn't contain enough leadership representation or a folk across different job functions.
"All engineering managers|Directors|Support Duty Managers are responsible for managing incidents" - In an instant, most likely after a severe incident was poorly handled, your C*O has deputized an entire subgroup of your organization to serve as an incident commander.
- Pro: Leveraging folks already on staff can be a cost savings
- Pro: If incentives are aligned to their performance in this capacity, this group has the potential to deliver a quality experience.
- Con: Under incentivized, 'Volun-told' incident commanders can be a mixed bag in skills, aptitude, and motivation. The consistency and predictability of processes can suffer without a higher degree of attention from a responsible party.
Targeted assigned group
An experienced group of folks that excel at incident management pick up the baton when asked.
- Pro: Leveraging folks already on staff can be a cost savings
- Pro: These folks will often deliver a fantastic incident command experience.
- Con: Due to the interrupt-driven nature of incidents, these individuals are more liable to burnout while shouldering the load of another day job.
- Con: These high performers are hard to come by, and spending their energy on incident response can take away from other valuable initiatives.
Full-time incident commander
Incident command as a full-time role.
- Pro: Focus incidents can ensure efficient responses.
- Pro: Non-technical follow-on tasks can be handled by these individuals, minimizing the impact on other responder groups.
- Pro: Non-incident-engaged hours can be spent improving the program, gathering metrics for leadership/teams, sharing learning from incidents, and training responders.
- Con: Requires a dedicated headcount that could be put into another business-critical area.
- Con: Might be more attention than is required for smaller organizations or organizations with fewer incidents.
A dedicated incident commander can play an invaluable role in organizations operating sufficiently complex systems where reliability interruptions cause immense pain. Less dedicated models can work well in cases where complexity or incident frequency is lower.
Regardless of what model you use, (hopefully) someone will step up to manage software incidents in your organization, and FireHydrant is an incident commander's best friend. FireHydrant automates, facilitates, and removes toil from the incident management process, allowing whatever human or group you'd prefer to focus on to help drive incidents to resolution.
See FireHydrant in action
See how service catalog, incident management, and incident communications come together in a live demo.Get a demo