Align platform and product engineering teams over incidents
This post explores how to align platform and product engineering teams by implementing business value proxy metrics and using incidents to inform them.
By Gonzalo Maldonado on 7/20/2023
I firmly believe in never letting a good incident go to waste. Incidents expose weak spots and create opportunities for medium and long-term investments. In analyzing incidents and understanding their root causes, organizations can identify areas that require additional resources or enhancements.
When incidents are used to align your platform and product engineering, it opens up opportunities to enhance the performance and security of your product. You also uncover valuable insights into your product, allowing you to view it from a user's perspective to build empathy and understanding of how your features are used.
In this blog post, which is based on a talk I gave at Conf42's SRE 2023 conference, I'll talk about how to align platform and product engineering teams by implementing business value proxy metrics and using incidents to inform them.
WATCH: Leveraging Incidents to Align Platform and Product Engineering by Gonzalo Maldonado was originally presented at Conf42 SRE 2023
Metrics to assess system performance and mission criticality
Business value proxy metrics are measurable indicators that serve as guides for the overall value delivered by a business. Monitoring proxy metrics allows organizations to understand better the value they are creating and make necessary data-driven adjustments to optimize operations.
These metrics provide insight into the effectiveness and success of business processes or initiatives that can help your team make more informed decisions and drive improvements. With clear proxy metrics, organizations can track and assess their progress toward achieving business objectives without having to spend lots of time building experimentation frameworks like A/B testing programs.
How incidents inform business value proxy metrics
Consider the following incident: an e-commerce website experiences a significant increase in shopping cart abandonment during checkout. The team works to resolve the issue and investigates its root causes: the aim is to reduce shopping cart abandonment rates by addressing and resolving both issues.
In a scenario like this, the business value proxy metric could be the conversion rate — specifically, the percentage of successful purchases made compared to the total number of shopping carts created. In monitoring this metric, the business can evaluate the impact of incident resolution-driven improvements on the overall business value and customer satisfaction.
Positive business value proxy metrics signal that a feature is creating the outcomes you want and reveal what is mission-critical.
Examples of business proxy metrics
Incidents can inform other product management and platform metrics like:
Retention
Activation
Revenue
Referral
And the corresponding business value proxy metrics would look like this:
Retention → Logins per hour
Activation → Signups per hour
Revenue → Subscriptions per day
Referral → Marketing page views
Then, creating alerts tied based on proxy metrics allow you to quickly and confidently resolve incidents and sustain successful user experiences simultaneously.
Next steps
Using incidents to align platform and product engineering improves performance, security, and user experience in the long run. Here’s how to get started.
Use an incident management solution to capture your learnings — (we happen to know of a great one!).
Perform retrospectives for every incident (whether you have an incident management solution or not), regardless of the incident's severity level.
Use the learnings to define key performance indicators and create business value proxy metric-backed alerts.
Aligning platform and product engineering through incident management is crucial for driving operational excellence, maintaining customer satisfaction, and protecting your bottom line.
See FireHydrant in action
See how our end-to-end incident management platform can help your team respond to incidents faster and more effectively.
Get a demo