Posts straight from FireHydrant's Engineering Team

A single source of truth: how CircleCI got 200 engineers in lock step when it comes to incident management


By bringing in FireHydrant to help improve their incident management practices, CircleCI has created a single source of truth that has helped them onboard engineers more easily and get them comfortable declaring and managing incidents faster. 

FireHydrant Incident retrospective: June 24, 2022


Between 2022-06-23 20:25 and 022-06-24 21:39, FireHydrant experienced an incident resulting in customers being unable to authorize the FireHydrant Slack app. This is the incident retrospective.

The not-so-obvious positive outcomes of great incident management


The industry and markets are volatile right now. More than ever, you should be focused on shipping great products, retaining engineers, and building trust with customers. The right incident management strategy can help you make strides in all three.

Best practices for building an incident management plan and process


A thoughtful incident management plan can help you avoid future security incidents and cut down your incident response time drastically.

FireHydrant hack week spring 2022 has shipped


One of our core values at FireHydrant is continuous improvement. Our engineering team runs bi-annual hack weeks to create space for experimentation, optimization, and building things that we’re passionate about.

Avoid frostbite: Stop doing code freezes


A code freeze is intentionally halting changes to your codebase and environments in an effort to reduce the risk of an outage.On the surface, pausing on deployments feels like a logical solution to preventing incidents. Unfortunately, this isn't the case.

How Service Catalog Increases Productivity


Productivity revolves around quality. A Service Catalog helps promote this quality. So as your company strives to move faster, make sure quality moves with you.

Reliability is not an engineering metric


Reliability is not a metric that engineering alone controls, everyone in the business has a substantial stake in the reliability your customers feel.

A Developer's Perspective: Lessons from Open Source with FireHydrant and Backstage


Our engineer Christine Yi's perspective on contributing to an open source project with the Backstage and FireHydrant plug in and the three key values she learned

What is a Service Catalog?


Learn some of the basics around building a service catalog and our philosophy around this growing space.

Working Together (but Separately) with MirageJS


We're using MirageJS to enable front-end and back-end teams to develop features asynchronously, without obstacles.

The MTTR that matters


We're over MTTR(esolution), but have you thought about MTTR(etro)?

WTF is Incident Management? Post-Panel Wrap-Up


Our panel discussion, "WTF is Incident Management," generated some great insight from a group of very experienced industry professionals.

Testing Shell Commands with the Crystal CLI


Using the Crystal programming language, you can share developer tools quickly and easily. FireHydrant's Backend Engineer extraordinaire, Jon Anderson, walks us through the steps of testing shell commands with the CLI.

February 4th, 2021 Incident Retrospective


Between 2021-02-05 00:20 and 2021-02-05 02:44, FireHydrant experienced an incident resulting in delayed runbook execution steps (Slack channel creation, etc) and intermittent availability issues on This is our incident retrospective.

What is SRE?


Site Reliability Engineering (SRE) is a practice for managing the reliability of systems. Google originally developed SRE in the early-2000s when Ben Treynor Sloss started the first SRE team, coined the name, and set the tone for the industry.

It's Time We Throw Out the Usage of 'Postmortem'


In any other job, conducting a postmortem means someone perished. Let's switch to a phrase that lessens the gruesomeness of software incidents. I wanted to provide some ideas that your organization could possibly run with as a replacement to “Postmortem.”

The Final Episode - Episode 10 of Throughput Thursdays


We made it to our final episode! Thank you to everyone that tuned in and watched Bobby get a Terraform provider up and running. We hope you enjoyed watching me through the good, bad, and ugly these past 20 or so hours.

Configuring a Runbook - Episode 9 of Throughput Thursdays


In episode 9 of Throughput Thursdays, we work to configure a Runbook and get it to work! Watch part 1 of our two-part finale below to see what happens.

Breaking down the interface - Episode 8 of Throughput Thursdays


In episode 8 of Throughput Thursdays, we break down all the logically grouped pieces into their own interfaces and create an interface on our client that can return.

More New Terraform Resources - Episode 7 of Throughput Thursdays


In episode 7, we create resources for managing teams and severities through the Terraform provider, which means we now can now manage more of users’ FireHydrant configurations with code.

Creating a Data Source - Episode 6 of Throughput Thursdays


In Episode 6, we update our Terraform resource for FireHydrant functionalities and create a data source for FireHydrant services. This allows us to pull services from a list and link them to functionalities. Linking resources like this lets us do a lot of cool things with Terraform.

Moving from Redux Thunk to Redux-Saga: A walk-through


At FireHydrant, we recently began to replace our usage of thunks with Sagas to handle our data fetching. Read how we moved from Redux Thunk to Redux-Saga.

Incident Ready: How to Chaos Engineer Your Incident Response Process


We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? Learn how to use chaos engineering principles to stress test your incident management process.

Testing Our Terraform Resources - Episode 5 of Throughput Thursdays


In this episode of Throughput Thursdays, we test our Terraform resources. If you missed it, you can watch it here.

How to: Automatically Archive Incident Slack Channels using conditions in FireHydrant Runbooks


FireHydrant’s Slack integration is a great way to speed up your incident response, especially if FireHydrant Runbooks is automatically creating channels in your Slack workspace for each incident.

Adding Two Terraform Resources - Episode 4 of Throughput Thursdays


In episode 4, we were able to achieve creating two full-blown Terraform resources for FireHydrant environments and functionalities. While simple resources, they unlock a lot of power that did not exist previously for teams that want to document their infrastructure using Terraform.

Are You Going to Chaos Conf?


Things are gearing up in our preparations for Chaos Conf by Gremlin. We're sponsoring the conference -- will we see you there?

Fixing Some Code Sins - Episode 3 of Throughput Thursdays


In episode 3, we built a flexible API client for our Terraform provider that implements a really simple interface. We also wrote some simple but effective tests and replaced the original cruft in the provider code with our new API client.

Build Your API First


Going API first will save you headaches in the long run. This post shares why choosing to go API first from Day 1 will be a game-changer for your business, and the decisions we made at FireHydrant to do this.

Live from Cape Cod - Episode 2 of Throughput Thursdays


In Episode 2, Bobby is live in Cape Cod, sitting on a dock about 4 inches from the edge of a lake. Last week we built a skeleton of a Terraform provider. Now we’ll get the provider to create and delete resources, like services in FireHydrant.

7 Ways to Get Acquainted With a New Codebase


Tori Crawford, one of our engineers, walks through some ways that you can get immersed in unfamiliar code. She gathered input and insights from the rest of the FireHydrant team to create this quick playbook on best practices that will make tackling any new codebase easier.

We’re Building a Terraform Provider! - Episode 1 of Throughput Thursdays


In Episode 1, we started out the Terraform provider with a simple data resource against the FireHydrant API. We were able to successfully retrieve information about a single service and display its name in our terminal!

How FireHydrant's CI/CD Infrastructure Fixes Bugs Faster


Almost everyone knows that working with third-party APIs can be challenging. Sometimes the errors happen unexpectedly. Sometimes the error information that you receive is inaccurate. While most people feel these pains acutely, I’d like to share how we answer these challenges at FireHydrant and how it’s helped us avoid headaches and stress.

The Culture of the Codebase


We like to have fun when we build our product - read about how Rebecca Black's "Friday" snuck its way into our codebase.

Sticking to Your SLAs with FireHydrant Runbooks


Due to the complexity of systems, it’s no longer a matter of “if” our systems will fail but “when”. To manage expectations for when our systems do fail, we can look no further than our Service Level Agreement.

Grow your Blame-Free Culture with These Postmortem Best Practices


Here are 3 postmortem practices that embrace a blame-free culture.

Avoid Institutionalized Incident Nonsense


How to get past the nonsense and look at problems differently.

A Single Person On-Call “Rotation” is a Critical Vulnerability


Why distributing your on-call workload is critical.

NFS with Docker on macOS Catalina


You like living on the edge, life is fun on the edge until the edge is a macOS major update. Then you use vibrantly colorful words, some that your dead ancestors heard, all because your development environment now doesn’t work in spectacular fashion.

Graceful Error Handling with Redux


Redux powers our global state at FireHydrant, one of the things we use most heavily is the ability to let redux store our API errors to handle failure states on the UI. See how we're using Redux to power our global state at FireHydrant.

Dynamic Kubernetes Informers


How we updated our Kubernetes integration at FireHydrant.

3 Defensive Programming Techniques for Rails


Defensive programming is great for codifying how a bug could be introduced, and raising an error right before it would happen, or choosing an alternative path. Here are some simple ideas to defend yourself against mistakes.

Rails without Webpacker


We recently removed webpacker from our Rails 5 application. This is a summary of the steps you can take to use vanilla webpack in your Rails application.

Instrumenting Ruby on Rails with Prometheus


If you’re running a production application, you need metrics. In the Rails community, this is commonly achieved with NewRelic and Skylight; but we achieve visibility using Prometheus and Grafana. Check out this guide on how to use Rails with Prometheus.

Understanding Istio Ingress


Istio is a hot technology right now. Giants such as Google and IBM have devoted entire teams of engineers to the project to push it to production readiness. Check out this post on getting to know Istio Ingress.

Stay Informed with Kubernetes Informers


FireHydrant has a changelog feature with a Kubernetes integration - read how our changelog works with Kubernetes.

Developing a Ruby on Rails app with Docker Compose


Learn how to set up a new Rails app in Docker compose.

Develop a Go app with Docker Compose


Learn how to structure a Go application with Docker Compose as your development environment.

Flexible Ruby on Rails Reader Objects


Rails and ActiveRecord provide a simple interface for retrieving information from a database. With a few characters, I can retrieve all of my users with User.all. This simplicity is great, but it breaks down when you start doing more advanced queries.

How FireHydrant Creates Data in Rails


How FireHydrant is built to support creating data in our integration-ready platform.

Using React Select with Redux Form


At FireHydrant we use Redux Form for all of our forms. It is extremely easy to build complex form logic with all sorts of added bonuses that make using it in our React/Redux front end a no brainer. Learn how FireHydrant uses Redux Form.