What does it mean to be on call?

Being on call means an employee is designated to work whenever an employer calls on them. In the DevOps world, on-call refers to the practice of designating specific engineers to be available in the case of an outage or major incident. Teams usually take turns being the on-call staff member so that someone is always available 24/7.

When people think of being “on-call,” they usually think of a 2 a.m. emergency page. However, our 2022 Incident Benchmark Report revealed that most incidents occur ​​on Tuesdays, Wednesdays, and Thursdays between the hours of 11 a.m. and 2 p.m. ET. This means, for many engineers, their incident mitigation occurs during the standard work day. 

On-call engineers may get pulled away from their routine, day-to-day tasks to resolve an incident as soon as possible. The more impactful an incident, the more engineers may need to be brought in. 

Why are on-call engineers important?

Designating on-call engineers ensures that there is always someone to step up and take charge as soon as an incident is declared. Whether an incident is declared during business hours or after hours, an on-call rotation guarantees coverage. This leads to quicker incident resolution. 

What are the best practices for on-call engineers? 

Life as an on-call engineer can sometimes feel stressful, and the unpredictability and stress of incident response can lead to burnout

To prevent alert fatigue, teams can:

  • Distribute the on-call workload. A single-person “on call rotation” is a critical vulnerability, so make sure multiple people in your organization have the ability to triage incoming incidents, and create a rotation schedule that allows people significant time off.

  • Create a robust incident response plan. On-call engineers feel more prepared when they know they can operate based on a concrete incident response plan. Although incident response looks different at every company, a strong plan will assign roles, set communication guidelines, create a space for resources, and list a documentation process. This helps reduce the cognitive load for engineers and makes on-call experience less onerous.

  • Practice incident response ahead of time. Navigating the unknown can bring additional stress to incident response. Run training to ensure engineers are familiar with the tools used during incident response. Scheduled training also provides on-call engineers opportunities to better understand your product, making it easier for them to know where to look when something malfunctions. 

  • Focus on mitigation. During an incident, engineers should focus on decreasing customer impact instead of searching for long-term resolution or completing the root-cause analysis. Once systems have returned to normal, you can do a deeper investigation with less stress.

See FireHydrant in action

See how service catalog, incident management, and incident communications come together in a live demo.

Get a demo