Creating Runbooks with the starter template
Runbooks are FireHydrant's tool for enabling your team to automate your pre-defined incident response process. If you're testing your incident response for the first time and don't have a defined process, you can learn more about Runbooks with the starter template. The template also helps you understand your configuration options when building Runbooks.
Creating a new Runbook
Start by creating a new Runbook from the starter template.
- In the FireHydrant left nav, click Runbooks.
- On the next page, click Create new (in the upper right corner).
- On the Create new Runbook page, click Use this template.
The Runbook edit page opens.
In the first section of the page, you can provide a name and optional summary for your Runbook. You can also include any additional information about the Runbook; the field for additional information is markdown-enabled. Information you include here can help with discoverability when users are searching for a Runbook for a specific scenario.
Defining when Runbooks will execute
At the bottom of the edit page is an option called Execution Rules.
If you don't specify a choice from the pull-down, this Runbook defaults to Always attach , and will run on every incident. This is a best practice when you're testing out your incident response process. However, Runbooks don't have to be attached to incidents indiscriminately. As your incident response process evolves, it helps to define responsibilities for team members and to automate specific steps based on the different conditions of an incident.
For example, you can configure Runbooks to be attached to incidents of different severity levels. You probably don't want to kick off your SEV 1 process each time someone an incident occurs; using Runbooks, you can configure different automation steps to run on a SEV 1 incident as opposed to a SEV 4 incident. These conditions can power many different cases for your team.
Configuring execution steps
Further down the page is a list of available steps on the left-hand side. FireHydrant offers steps to modify the state of your incident and also to communicate with your integrations. Integrations that you have already configured appear first, with non-configured integrations below.
You can still add steps to your Runbook for integrations that you haven't configured yet, but be aware that FireHydrant will skip these steps when the Runbook is executed. After you configure these integrations, the execution steps related to them will run.
Note: Runbooks are designed for each step to execute as quickly as possible; however, this means Runbook steps won't always execute in order (as they appear in the FireHydrant UI). To add Runbook steps that won't be executed until after another specific step, you can add that limitation by using Runbook conditions.
On the right side, you'll see a list of the steps that will be run as part of our best practice Runbook. Let's go through what each of those steps entails.
Step 1: Create Slack channel
Create a dedicated Slack channel to capture all of your incident communication. You can fully interact with the FireHydrant platform inside of your dedicated Slack room with slash commands like /firehydrant status. This step will create a Slack channel named incident-{number}
. Learn more about using Liquid templates in Runbooks.
Step 2: Notify Incident Channel with custom message
By adding an initial message to your dedicated incident channel in Slack, you can provide responders with reminders about how to interact with the Slackbot or details that are specific to your team's incident response process.
Step 3: Notify Slack channel
By notifying other channels when an incident is opened, you can keep other stakeholders or interested parties up to date without an engineer having to step away from mitigation to let people know that something has happened. This step will notify a channel called fh-alerts
that a new incident has been opened.
Step 4: Create incident ticket
By creating an incident ticket in Jira, you can associate all follow-up items from your retrospective to this incident. If your team tracks incident metrics inside of Jira, the ticket to track that information can be automatically created. This creates a ticket in Jira with the same name as your incident using Liquid templating, with the variable #{{ incident.name }}
.
Step 5: Assign team
By assigning team members to your incident, you can get the right responders mitigating your incident as fast as possible. If you have configured an integration with an alerting provider such as PagerDuty, these roles can be dynamically assigned to whoever is currently on call. You will need to select which team to assign using this Runbook step. FireHydrant creates a team when you create your first account called 'First Responders'
Step 6: Create PD incident
FireHydrant can also be the tool to ensure that the people assigned to an incident are also notified through a workflow that they are already familiar with. FireHydrant can kick off an incident in PagerDuty to ensure that the correct escalation policy is notified during an incident. You will need to add some additional configuration to this step after configuring PagerDuty to ensure that you are paging the correct service and Escalation Policy.
Step 7: Create a Zoom meeting
FireHydrant can automatically spin up a Zoom room for you and associate it with your incident. By doing this, the link for the Zoom room is included anytime the incident details are posted in Slack.
Step 8: Send email notification
While Slack is great for notifying people that are online, many companies have an SLA to notify executives of incidents by email when they are opened. FireHydrant will automatically send a templated email to whoever you would like, and can repeatedly send that email over the course of mitigation.
Step 9: Notify incident channel with custom message
Throughout the course of an incident, it is hard to keep the correct people updated when your team is focused on mitigating. By including a 30-minute repeating message to remind people to update, you can help your team remember to update stakeholders and external status pages.
This is the first time that we are using a repeating step. On the step, click Conditions and scheduling text to switch panels.
This panel allows you to add rules to when this step will automatically complete (more on this later), how often to repeat this step, and whether to execute this step automatically. For now, let's just talk about a repeating step.
When a step has a repeat time attached to it, once that step completes, FireHydrant will then wait until the repeat duration has passed and then create a copy of that step and attempt to complete it automatically. For example, this step will run right away at the beginning of your incident and then a copy of this step will be added to the runbook 30 minutes later and will run again. These repeating steps will be canceled once your incident is resolved and can be manually canceled by going to the Incident Command Center page and navigating to that runbook step and hitting the stop button.
Step 10: Archive incident channel
After you have finished your retrospective in FireHydrant, there is no need to keep the Slack channel around. This will automatically archive the dedicated incident channel for you. (Note that all of the chat content is still saved on the FireHydrant UI.)
This step also works as an introduction to the Rules section of Runbook steps. Similar to the conditions for the Runbook attachment section at the top, Runbook rules are conditions of the incident state that must be matched for the step to run.
In this case, the incident must have a completed Retrospective before the incident channel can be archived. Like Runbook attachments, you can link together multiple rules to build out very specific circumstances when a step will run.
After you've created your Runbook, we recommend taking your time installing your integrations so you can explore all of the configurable options and full capabilities of Runbooks.