Back
Copied

Chapter 2: Defining incident management

Defining incident management

on

Defining incident management#defining-incident-management

What is an incident?#what-is-an-incident

Incidents are disruptive to your customers and they’re disruptive to the teams that have to manage them. They can be something routine, like a broken feature, to something catastrophic, like a total system failure. They can keep engineering teams up at night, burned out, and on edge, or they can be managed and resolved as quickly as possible to avoid the financial losses associated and to keep your customer trust intact. Incidents are inevitable within technical organizations, but great incident management practices not only mitigate the damage they can cause, but they also drive process efficiencies across the entire organization and among your team.

No matter how strong the technology organization is, incidents will happen. The goal is to reduce the frequency and severity of those incidents and minimize their impact. Businesses can do that by making it easier for teams to access needed expertise and by providing clear, ongoing communication. ~

An incident is resolved when the disrupted service is returned to its normal standard of operation: for example, when your broken feature is working again, your database is up and running, or your users can once again access their accounts as expected. Successful incident management requires successful monitoring, alerting, declaration of severity, system analysis, correction, communication, and retrospective.

Learn more about Incident Response Management in our Reliability Guide.

See FireHydrant in action

See how our end-to-end incident management platform can help your team respond to incidents faster and more effectively.
Debug Information

Debug "data"

"{\"_id\":\"db9463c772aeaab3d732ba0ed95f8313\"}"