Thoughts
FireHydrant's POV and thoughts
We can’t all be Shaq: why it’s time for the SRE hero to pass the ball and how to get there
2022-05-25
By taking some first steps away from being the hero, we can help our companies shift toward better incident management and improve things for our customers, for our teammates, and for ourselves.
The not-so-obvious positive outcomes of great incident management
2022-05-18
The industry and markets are volatile right now. More than ever, you should be focused on shipping great products, retaining engineers, and building trust with customers. The right incident management strategy can help you make strides in all three.
Understanding Service Level Objectives
2022-05-10
How to create effective SLOs and connect them with SLAs and SLIs.
Avoid frostbite: Stop doing code freezes
2021-11-11
A code freeze is intentionally halting changes to your codebase and environments in an effort to reduce the risk of an outage.On the surface, pausing on deployments feels like a logical solution to preventing incidents. Unfortunately, this isn't the case.
Reliability is not an engineering metric
2021-09-30
Reliability is not a metric that engineering alone controls, everyone in the business has a substantial stake in the reliability your customers feel.
We’ve raised a $23M Series B to help us get to a world where all software is reliable
2021-08-10
We envision a world where all software is reliable, and we’re on a mission to help every company that builds or operates software get closer to 100% reliability. Today, we’re thrilled to announce that we’ve raised $23 million to help us further our goal.
Pragmatic Incident Response: 3 Lessons Learned from Failures
2021-07-15
Lessons learned from the front line that you actually immediately use in your incident management process.
Alert Fatigue and Your Health
2021-03-09
Alert fatigue can not only cost not only cause more errors and financially impact your business but can also be detrimental to your health. This post goes over how alert fatigue manifests and some ideas on how to combat it,
It's Time We Throw Out the Usage of 'Postmortem'
2021-02-10
Why are we using the term 'postmortem' when no one died? In any other job, conducting a postmortem means someone perished, so we need to switch to another phrase to lessen the gruesomeness of software incidents. I wanted to provide some ideas that your organization could possibly run with as a replacement to “Postmortem.”
2021 is the Year of Reliability
2021-01-19
We all know it: You expect your software tools to work every time, all the time. Let's do better this year - there’s no better time than now to dedicate effort to fireproofing your software.
Build Your API First
2020-09-21
Going API first will save you headaches in the long run. This post shares why choosing to go API first from Day 1 will be a game-changer for your business, and the decisions we made at FireHydrant to do this.
The Culture of the Codebase
2020-06-24
We like to have fun when we build our product - read about how Rebecca Black's "Friday" snuck its way into our codebase.
Avoid Institutionalized Incident Nonsense
2019-11-12
How to get past the nonsense and look at problems differently.
A Single Person On-Call “Rotation” is a Critical Vulnerability
2019-10-09
Why distributing your on-call workload is critical.
Open Source can be a Silver Bullet, but your Application Might be a Werewolf
2019-09-22
A story about open source.