Hot Take: Don't provide incident resolution estimates
Providing incident resolution times to customers is an unneeded stress for responders with very little gain.
By Robert Ross
9/24/2024
Robert believes that better incident management is integral to a world where all software is reliable. He founded FireHydrant in 2018 as the tool he wished he’d had when managing incidents on call at companies like Namely and DigitalOcean.
Providing incident resolution times to customers is an unneeded stress for responders with very little gain.
By Robert Ross
9/24/2024
The addition of Blameless' enterprise capabilities combined with FireHydrant's platform creates the most comprehensive enterprise incident management solution in the market.
By Robert Ross
8/21/2024
Learn how to implement semantic search in Ruby on Rails using the Neighbor gem, Anthropic's Claude API for summarization, and OpenAI for text embeddings. Enhance your app's search capabilities with meaning-based results.
By Robert Ross
7/30/2024
Now, no matter what the smoke signal looks like or where it comes from, you can alert your on-call teams instantly.
By Robert Ross
7/23/2024
When the internet breaks, who fixes it? Dive into the high-stakes world of tech crisis teams. From virtual war rooms to digital firefighting, uncover how software teams battle major outages.
By Robert Ross
7/19/2024
Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. And that’s just the beginning.
By Robert Ross
3/18/2024
It's not good enough for a tool to just do one thing anymore. We need Swiss Army knife tools that help us demonstrate our value, enhance our oversight of software systems, and continually expand our skills.
By Robert Ross
3/6/2024
Introducing alerting and on-call scheduling for modern engineering teams. Get fair pricing, team-based controls, less noise, and flexible scheduling from the team that brought you powerful, end-to-end incident management.
By Robert Ross
2/29/2024
It's time to address the elephant in the room: alerting tools have become nothing more than over-priced pagers, drowning you in a sea of notifications that may or may not be urgent — and it's taking a toll.
By Robert Ross
1/18/2024
Signals is now available in beta for teams eager to implement a cost-effective alerting tool designed specifically for how modern DevOps teams work.
By Robert Ross
12/8/2023
TL;DR: Schedules are simple — until they're not. Let's talk about how we architected an on-call scheduling system that works for modern businesses.
By Robert Ross
12/5/2023
TL;DR: We wanted to make evaluating incoming signals powerful, intuitive, and fast. Using CEL (common expression language) is our answer.
By Robert Ross
11/20/2023
TL;DR: Signals must be resilient, and we're excited about the pattern we've implemented to make it so.
By Robert Ross
11/8/2023
It's high time we acknowledge that the way we’re running alerting is stuck in time. To move forward, we must lift the curtain on areas that have to evolve and embrace the new principles of incident alerting.
By Robert Ross
10/2/2023
When every incident is chaos, the people problem gets overwhelming, and there’s no culture of improvement — engineers get burned out and seek greener pastures. That’s the cultural drain caused by incidents.
By Robert Ross
9/12/2023
The total cost of incidents goes beyond the time spent resolving them. It also includes the cost of time that otherwise would’ve been focused on developing the next big thing. That’s opportunity cost.
By Robert Ross
8/24/2023
Downtime is just the beginning when it comes to the cost of poorly handled incidents. In this blog post, we explore how the first few minutes of an incident can set you up for success or sabotage — and how to ensure the former.
By Robert Ross
8/16/2023
In this blog post, we’ll talk about two incident management structure models — distributed and centralized, including the pros and cons of each, and examples of what each structure looks like in our community.
By Robert Ross
8/8/2023
Analyst firm Enterprise Strategy Group conducted an in-depth evaluation of the costs associated with poor incident management practices and found that FireHydrant reduces the cost of incidents significantly.
By Robert Ross
7/26/2023
Organizing everyone involved in an incident can be more complicated than mitigating the incident itself. But there are processes you can put into place that will draw boundaries around who does what during an incident and set your stakeholders at ease.
By Robert Ross
6/20/2023
Although we can’t control how long it might take to mitigate an incident, we can exercise a great deal of control over how quickly and prepared we get to the scene of the problem. We call that phase of the incident lifecycle “assembly time.”
By Robert Ross
5/4/2023
By Robert Ross
3/24/2023
We’re all looking to maximize every dollar spent, every hire made, every hour logged. But there’s one cost center you might not be thinking about — incident management. This post explores the explicit and implicit costs associated with incidents.
By Robert Ross
2/14/2023
By automating some rote parts of incident response, you reduce decision fatigue and help responders get to solving the problem faster with less stress. In this post, we talk about three areas of the incident response process that are prime for automation.
By Robert Ross
1/3/2023
FireHydrant received three G2 Winter 2023 awards — High Performer, a High Performer in the Enterprise category, and a High Performer in the United Kingdom. We are honored to be recognized by G2 because these awards are based on customer reviews.
By Robert Ross
12/21/2022
Using anonymized data from 50,000 incidents, the Incident Benchmark Report reveals insights into the when, what, who, and how behind incidents and highlights behaviors that correlate to faster response times.
By Robert Ross
12/15/2022
This post explores how we built FireHydrant in a way that allows us to rapidly build and deploy integrations to help our product fit into responders’ workflows and not vice versa.
By Robert Ross
11/28/2022
FireHydrant was recently featured in two industry reports, proving that strategic investments in incident management pay off for companies of all sizes.
By Robert Ross
10/6/2022
Let’s look at three mistakes I’ve made during those stressful moments during the beginning of an incident — and discuss how you can avoid making them.
By Robert Ross
6/29/2022
The first step in understanding how to shift from incident response to incident management is to define what those terms mean.
By Robert Ross
6/24/2022
You can see big gains from small investments when it comes to incident management, and the fundamentals can be put in place without purchasing tools or hiring new staff. Here are three steps you can take to better incident management today.
By Robert Ross
6/9/2022
The industry and markets are volatile right now. More than ever, you should be focused on shipping great products, retaining engineers, and building trust with customers. The right incident management strategy can help you make strides in all three.
By Robert Ross
5/18/2022
How to create effective SLOs and connect them with SLAs and SLIs.
By Robert Ross
5/10/2022
A thoughtful incident management plan can help you avoid future security incidents and cut down your incident response time drastically.
By Robert Ross
4/5/2022
We envision a world where all software is reliable, and today we’re making that vision more of a reality for small teams. Today, FireHydrant is pleased to announce our new Free Tier for small teams!
By Robert Ross
3/14/2022
In this post we'll explain the differences between Incident severity and Incident priority as well as detail out practical levels and summaries for both.
By Robert Ross
2/24/2022
A code freeze is intentionally halting changes to your codebase and environments in an effort to reduce the risk of an outage.On the surface, pausing on deployments feels like a logical solution to preventing incidents. Unfortunately, this isn't the case.
By Robert Ross
11/11/2021
Reliability is not a metric that engineering alone controls, everyone in the business has a substantial stake in the reliability your customers feel.
By Robert Ross
9/30/2021
Chaos engineering is an essential part of creating an effective incident management system and implementing processes that can help keep you in control when real chaos threatens your code.
By Robert Ross
8/24/2021
This is a quick primer to get started in Site Reliability Engineering if you're interested in becoming a Site Reliability Engineer (SRE).
By Robert Ross
8/16/2021
We envision a world where all software is reliable, and we’re on a mission to help every company that builds or operates software get closer to 100% reliability. Today, we’re thrilled to announce that we’ve raised $23 million to help us further our goal.
By Robert Ross
8/10/2021
Lessons learned from the front line that you actually immediately use in your incident management process.
By Robert Ross
7/15/2021
We're over MTTR(esolution), but have you thought about MTTR(etro)?
By Robert Ross
6/10/2021
Four things to consider when evaluating incident management platforms--from whether you have the culture and process to support a potential tool, to understanding your pain points, to knowing which key stakeholders to involve.
By Robert Ross
5/26/2021
Alert fatigue can not only cost not only cause more errors and financially impact your business but can also be detrimental to your health. This post goes over how alert fatigue manifests and some ideas on how to combat it,
By Robert Ross
3/9/2021
In any other job, conducting a postmortem means someone perished. Let's switch to a phrase that lessens the gruesomeness of software incidents. I wanted to provide some ideas that your organization could possibly run with as a replacement to “Postmortem.”
By Robert Ross
2/10/2021
Incidents are inevitable, and the reality is some of them are inevitably going to repeat themselves. Common incident types were slightly burdensome for our customers, so we're announcing an easy way to declare incidents using templates.
By Robert Ross
1/27/2021
We all know it: You expect your software tools to work every time, all the time. Let's do better this year - there’s no better time than now to dedicate effort to fireproofing your software.
By Robert Ross
1/19/2021
We made it to our final episode! Thank you to everyone that tuned in and watched Bobby get a Terraform provider up and running. We hope you enjoyed watching me through the good, bad, and ugly these past 20 or so hours.
By Robert Ross
12/4/2020
In episode 9 of Throughput Thursdays, we work to configure a Runbook and get it to work! Watch part 1 of our two-part finale below to see what happens.
By Robert Ross
11/20/2020
In episode 8 of Throughput Thursdays, we break down all the logically grouped pieces into their own interfaces and create an interface on our client that can return.
By Robert Ross
11/13/2020
In episode 7, we create resources for managing teams and severities through the Terraform provider, which means we now can now manage more of users’ FireHydrant configurations with code.
By Robert Ross
10/30/2020
In Episode 6, we update our Terraform resource for FireHydrant functionalities and create a data source for FireHydrant services. This allows us to pull services from a list and link them to functionalities. Linking resources like this lets us do a lot of cool things with Terraform.
By Robert Ross
10/23/2020
In this episode of Throughput Thursdays, we test our Terraform resources. If you missed it, you can watch it here.
By Robert Ross
10/9/2020
In episode 4, we were able to achieve creating two full-blown Terraform resources for FireHydrant environments and functionalities. While simple resources, they unlock a lot of power that did not exist previously for teams that want to document their infrastructure using Terraform.
By Robert Ross
10/2/2020
In episode 3, we built a flexible API client for our Terraform provider that implements a really simple interface. We also wrote some simple but effective tests and replaced the original cruft in the provider code with our new API client.
By Robert Ross
9/25/2020
Going API first will save you headaches in the long run. This post shares why choosing to go API first from Day 1 will be a game-changer for your business, and the decisions we made at FireHydrant to do this.
By Robert Ross
9/21/2020
In Episode 2, Bobby is live in Cape Cod, sitting on a dock about 4 inches from the edge of a lake. Last week we built a skeleton of a Terraform provider. Now we’ll get the provider to create and delete resources, like services in FireHydrant.
By Robert Ross
9/18/2020
In Episode 1, we started out the Terraform provider with a simple data resource against the FireHydrant API. We were able to successfully retrieve information about a single service and display its name in our terminal!
By Robert Ross
9/11/2020
Fire hydrants usually have a firehose hooked up, and do we have a firehose of updates this July. We’ve been focused on making FireHydrant simpler to use and more deeply integrated with existing workflows to make managing your complex systems easier.
By Robert Ross
7/17/2020
We like to have fun when we build our product - read about how Rebecca Black's "Friday" snuck its way into our codebase.
By Robert Ross
6/24/2020
For the past year we've seen over 50,000 hours of incidents, 20,000 runbook actions automated, and 10 million deploy events, and we're happy to announce our $8M Series A led by Menlo Ventures.
By Robert Ross
5/20/2020
Bobby shares his new hobby: making craft cocktails. In this post we’re going to make a classic: The Old Fashioned.
By Robert Ross
4/21/2020
How to get past the nonsense and look at problems differently.
By Robert Ross
11/12/2019
Announcing our most powerful feature yet: FireHydrant Runbooks is a better way to automate your incidents.
By Robert Ross
10/17/2019
You like living on the edge, life is fun on the edge until the edge is a macOS major update. Then you use vibrantly colorful words, some that your dead ancestors heard, all because your development environment now doesn’t work in spectacular fashion.
By Robert Ross
10/8/2019
A story about open source.
By Robert Ross
9/22/2019
How we updated our Kubernetes integration at FireHydrant.
By Robert Ross
8/28/2019
Announcing our latest integration with Statuspage.io.
By Robert Ross
8/22/2019
Defensive programming is great for codifying how a bug could be introduced, and raising an error right before it would happen, or choosing an alternative path. Here are some simple ideas to defend yourself against mistakes.
By Robert Ross
7/29/2019
The FireHydrant team is dominantly from San Diego, 3 of our 4 person team actually. We’re here to enjoy the awesome community that Go has been creating and to meet new faces. But we also wanted to give back a little with a small guide on food and drinks in Downtown San Diego.
By Robert Ross
7/24/2019
Our latest updates - let anyone open an incident in Slack, the launch of Severity Matrix, and a new integration with Jira.
By Robert Ross
7/12/2019
We’re launching a new feature today that allows anyone in your organization to kick off your incident response process with an appropriate severity level attached from Slack.
By Robert Ross
6/28/2019
So you’ve signed up to give a tech talk, awesome! You’re a subject matter expert in something and want to share your knowledge, that’s what helps make a community awesome. You’re going to be speaking in front of a room of people that you don’t know in a place you’ve likely never been, talking about something you confidently know. Sounds easy, right?
By Robert Ross
6/12/2019
Read more about our latest releases - we've launched webhooks and now you can keep your frequent searches for later.
By Robert Ross
6/3/2019
We recently removed webpacker from our Rails 5 application. This is a summary of the steps you can take to use vanilla webpack in your Rails application.
By Robert Ross
5/28/2019
We’re on a mission to make responding to incidents a bit less chaotic. One of the best features we offer (we’re definitely not biased, no way) is a simple way to define how a severity gets determined when you open an incident. We call it the severity matrix, and today it has a new look.
By Robert Ross
5/28/2019
If you’re running a production application, you need metrics. In the Rails community, this is commonly achieved with NewRelic and Skylight; but we achieve visibility using Prometheus and Grafana. Check out this guide on how to use Rails with Prometheus.
By Robert Ross
5/5/2019
Istio is a hot technology right now. Giants such as Google and IBM have devoted entire teams of engineers to the project to push it to production readiness. Check out this post on getting to know Istio Ingress.
By Robert Ross
5/2/2019
Learn how to set up a new Rails app in Docker compose.
By Robert Ross
5/1/2019
Learn how to structure a Go application with Docker Compose as your development environment.
By Robert Ross
5/1/2019
FireHydrant has a changelog feature with a Kubernetes integration - read how our changelog works with Kubernetes.
By Robert Ross
5/1/2019
Rails and ActiveRecord provide a simple interface for retrieving information from a database. With a few characters, I can retrieve all of my users with User.all. This simplicity is great, but it breaks down when you start doing more advanced queries.
By Robert Ross
5/1/2019
Our latest product releases: SSO, Post Mortem Generator. Today we're happy to announce our single sign-on support and updates to our postmortem functionality.
By Robert Ross
5/1/2019
How FireHydrant is built to support creating data in our integration-ready platform.
By Robert Ross
4/23/2019
Today we're happy to release our incident status page feature! If you operate within an organization that has stakeholders that need the gist of what's going on, how to respond to customers, and give a general feeling of "we're on it," this feature was built for you.
By Robert Ross
4/11/2019
Today we're publicly launching FireHydrant, a tool to manage incidents. Read more about our journey.
By Robert Ross
4/2/2019
See how our end-to-end incident management platform can help your team respond to incidents faster and more effectively.
Get a demo