Incident metrics are crucial for helping you understand the health and effectiveness of your services, environments, functionalities, and incident response teams. They can help us determine how quickly we are responding to incidents, and in turn, how much trust we are building with our users. If we are not paying attention to relevant metrics, we can lose valuable time by investing in the wrong projects and procedures. Luckily, FireHydrant can provide you with all the information you need to make informed business decisions when it comes to reliability.
The following definitions include common incident milestones, which are defined in this article.
MTTD : Mean Time to Detection
time of detection - time of incident start
MTTA : Mean Time to Acknowledged
time to acknowledgment - time of incident start
MTTM : Mean Time to Mitigation
time to mitigation - time of incident start
MTTR : Mean Time to Resolution
time to resolution - time of incident start
(MTTM * incidents) / time window
As an example, if you have an incident for a given service that was started at noon, mitigated at 1 PM, and then resolved at 2 PM, healthiness for that infrastructure would be 50% for the window of noon to 2 PM.
- Impact : Within a given date range, multiple incidents are added up to calculate the time a service, functionality, or environment was degraded.