< script type = "text/javascript" > _linkedin_partner_id = "6933044"; window._linkedin_data_partner_ids = window._linkedin_data_partner_ids || []; window._linkedin_data_partner_ids.push(_linkedin_partner_id);

Captain's Log: How we are leveraging CEL for Signals

TL;DR: We wanted to make evaluating incoming signals powerful, intuitive, and fast. Using CEL (common expression language) is our answer.

Robert Rossprofile image

By Robert Ross on 11/20/2023

Captain's Log: All about CEL

As engineers, we didn't want to make Signals only a replacement for what the existing incumbents do today. We've had our own gripes for years about the information architecture many old companies still force you to implement today. You should be able to send us any signal from any data source and create an alert based on some conditions.

We're no strangers to building features that include conditional logic, but we upped the ante when it came to Signals. When routing Signals to the appropriate teams, we're leveraging the powerful CEL (Common Expression Language) library.

signal.summary.startsWith("[Triggered]") && signal.annotations["functionality"].in(["logging-in", "checkout"])

CEL at a glance

CEL is a simple expression language that allows you to evaluate and return values from a simple text snippet. For Signals, we leverage CEL (mostly) to return a boolean, but it can be used to produce any type as long as it's registered in the CEL environment.

There are a few components when leveraging CEL that you need to understand:

  • CEL Environment: The CEL environment sets the stage for expression evaluation, defining available variables, functions, and custom types.

  • CEL AST (Abstract Syntax Tree): The AST is a tree-like representation of an expression, breaking it down into components like operators and values for efficient evaluation.

  • CEL Program: In CEL, a program is an expression you want to evaluate, compiled within the environment to interact with your data, such as boolean conditions in Signals.

In other words, a CEL Environment mixed with an AST is a CEL Program, and that Program can be told to evaluate and return a value.

How we use CEL in Signals

We wanted to build Signals so that a generic type (a Signal) could be evaluated against a team's rule expressions in FireHydrant. When we tested CEL to quickly assess a Signal payload coming in, it was lightning fast, and we could evaluate thousands of Signals per second with it on a very minimalist application setup.

When an incoming signal is processed by Siren, the service that handles all alerting functionality, we immediately look at all the Signal Rules created by an organization. We quickly iterate through each rule and evaluate them against the incoming Signal in parallel. Here's what it looks like distilled down:

rules := &types.OrganizationRules{
  Rules: []*types.SignalRule{
    Expression: "signal.summary.startsWith(\"[Triggered]\")",
    Target: &types.Target{} // omitted
  },
}

signal := &types.Signal{
  Summary: "[Triggered] Request failure rate is breaking SLOs",
}

var matches []*types.SignalRule
for _, rule := range rules {
  e, err := NewRuleEvaluator(ctx, rule.Expression)
  if err != nil {
    logger.Warn("skipping rule due to invalid evaluator", "Error", err)
    continue
  }

  matched, err := e.Matches(ctx, signal)
  if err != nil {
    logger.Warn("skipping rule due to error while matching", "Error", err)
    continue
  }

  if matched {
    matches = append(matches, rule)
  }
}

This code is the snippets (it's far more complicated in actuality) of how we process an incoming Signal. But let's get more detailed about the NewRuleEvaluator function, and how it evaluates our Signals.

CEL Evaluation in Go

As mentioned in the previous captain's log, we're heavily utilizing protocol buffers for all our messages in Signals. As a reminder, here is what our Signal message looks like:

syntax = "proto3";
package firehydrant.signals;

message Signal {
  string id = 1;
  string organization_id = 2;
  Level level = 3;
  string summary = 4;

  // other bits removed
}

The cel-go package can leverage protocol buffers out of the box, making creating our Signal Rule evaluator a piece of cake. Here's the general idea of how it works in Siren:

package ingest

import (
  "context"
  "fmt"

  "github.com/firehydrant/firestarter-go/telemetry"
  "github.com/firehydrant/siren/types"
  "github.com/google/cel-go/cel"
)

type RuleEvaluator struct {
  env     *cel.Env
  ast     *cel.Ast
  program cel.Program
  expr    string
}

func NewRuleEvaluator(ctx context.Context, expr string) (*RuleEvaluator, error) {
  ctx, span := telemetry.StartSpan(ctx, telemetry.DefaultSpanName()) //nolint:all // ineffectual assignment of ctx is preferred
  defer span.End()

  env, err := cel.NewEnv(
    cel.Types(new(types.Signal)),
    cel.Variable("signal", cel.ObjectType("firehydrant.signals.Signal")),
  )

  if err != nil {
    return nil, fmt.Errorf("could not create env: %w", err)
  }

  ast, iss := env.Compile(expr)
  if iss.Err() != nil {
    return nil, fmt.Errorf("%w: %s", ErrInvalidExpression, iss.Err())
  }

  program, err := env.Program(ast, cel.EvalOptions(cel.OptTrackCost))
  if err != nil {
    return nil, fmt.Errorf("could not generate CEL program: %w", err)
  }

  return &RuleEvaluator{
    env:     env,
    ast:     ast,
    program: program,
    expr:    expr,
  }, nil
}

func (e *RuleEvaluator) Matches(ctx context.Context, signal *types.Signal) (bool, error) {
  ctx, span := telemetry.StartSpan(ctx, telemetry.DefaultSpanName())
  defer span.End()

  out, details, err := e.program.Eval(map[string]any{
    "signal": signal,
  })
  if err != nil {
    return false, fmt.Errorf("%w: %s", ErrInvalidExpression, err)
  }

  telemetry.SetTopLevelSpanAttributes(ctx, map[string]string{
    "siren.cel.expression": e.expr,
    "siren.cel.eval_cost":  fmt.Sprintf("%d", details.ActualCost()),
  })

  v := out.Value()
  res, ok := v.(bool)
  if !ok {
    return false, fmt.Errorf("%w: got %T", ErrUnsupportedReturnType, v)
  }

  return res, nil
}

This allows us to receive signals from any source and rip through our evaluations of them for customers, all to eventually reach their target destination to alert the end user.

Drawbacks of leveraging CEL

CEL isn't exactly well documented. Even though it's developed by Google and leveraged in Kubernetes, Protocol Buffers, and GKE for IAM policies, we still found ourselves constantly code spelunking to figure out how to add custom functions, variables, etc. As a part of our Signals build, we'll release a comprehensive CEL expression guide when the time is right.

CEL is for developers

The adoption of CEL will only continue to accelerate as popular tools like Kubernetes adopt it, and we want to ensure we make a developer-first tool that gives people the raw power they need when alerting teams quickly and accurately.

Also, it looks terrific in our Terraform provider update:

resource "firehydrant_signal_rule" "datadog_source" {
  team_id = firehydrant_team.primary.id
  name = "Datadog Source"
  expression = "signal.summary.contains(\"[Triggered]\")"
  target_type = "EscalationPolicy"
  target_id = firehydrant_escalation_policy.default_policy.id
}

See Signals in action

Experience a cost-effective alerting tool designed specifically for how modern DevOps teams work.

Join the waitlist