Increasing Trace Cardinality with Activity Tags and Baggage

One of the first "oh no" moments for folks new to distributed tracing is the needle in the haystack problem. Someone reports an error, you go look for traces in your tool (Azure Monitor or whatever), and because there are thousands of traces, you can't easily figure out which is your trace that you want to find. Fundamentally, this is the challenge of cardinality. We want a high enough cardinality of data to be able to effectively find our individual trace when searching.

OpenTelemetry and the Activity API give us two main ways to add additional information to spans/traces:

In the Activity API, attributes are "Tags" and Baggage is still "Baggage", since these names predate OpenTelemetry.

The key difference between these two are propagation. Both are key-value pairs of information, but only Baggage propagates to subsequent traces. This means that intraprocess communication needs a means to propagate - and that's exactly what the W3C Baggage standard describes.

We do have to be careful about baggage, however, as it will accumulate. Anything added to baggage will show up in all child spans.

Tags and Baggage in Activities

With the activity API, it's quite straightforward to add tags and baggage to the current activity. Suppose we want to include some operation ID as part of our trace with a tag:

[HttpGet]
public async Task<ActionResult<Guid>> Get(string message)
{
    var command = new SaySomething
    {
        Message = message,
        Id = Guid.NewGuid()
    };

    Activity.Current?.AddTag("cart.operation.id", command.Id.ToString());

The AddTag and AddBaggage methods let us add these key/value pairs of data. If we're using OpenTelemetry, our tags will automatically show up in a trace:

Custom tag showing up in Zipkin trace

This tag can be searched on, making sure we only find the trace we're interested in:

Searching tag value in Zipkin

If you're already using logging contexts, this is probably quite familiar to you.

While tags are great for including additional information about a single span, they don't propagate, so you might have a harder time correlating multiple sets of data together, such as "find all queries executed for cart 123". You might only know the cart at one specific span, but not all the way down in all the related activities.

For that, we can use Baggage:

[HttpGet]
public async Task<ActionResult<Guid>> Get(string message)
{
    var command = new SaySomething
    {
        Message = message,
        Id = Guid.NewGuid()
    };
    
    Activity.Current?.AddBaggage("cart.operation.id", command.Id.ToString());

For both interprocess and intraprocess Activities, the Baggage will propagate. Here we can see baggage propagate through the headers in RabbitMQ:

Baggage and trace context in headers in RabbitMQ message

Unfortunately, baggage won't automatically show up in our traces, we'll have to do something special to get those to show up.

Reporting baggage in telemetry data

While this context information gets passed through all of our spans, it won't necessarily show up in our tracing tools. This is because attributes are the primary reporting mechanism for information for traces, and baggage is a larger concept that can be used by logs, traces, and metrics to enrich each of those. It might seem counterintuitive that baggage is not automatically reported, but it's simply because it's a broader concept intended to be consumed by other observability pillars.

For us, if we want to just automatically include all baggage in tags as they are recorded, we can do so by registering a simple ActivityListener at startup:

public static void Main(string[] args)
{
    var listener = new ActivityListener
    {
        ShouldListenTo = _ => true,
        ActivityStopped = activity =>
        {
            foreach (var (key, value) in activity.Baggage)
            {
                activity.AddTag(key, value);
            }
        }
    };
    ActivitySource.AddActivityListener(listener);

    CreateHostBuilder(args).Build().Run();
}

With this in each of my applications, I can ensure that all baggage gets shipped as tags out to my tracing system:

Baggage showing up in span for database interaction

Above, I can see my baggage I set way up in a Controller made it all the way down to a MongoDB call. It's crossed several process boundaries to get there:

Trace highlighted with span far from baggage origination

Typically, we'll also include this baggage context in our structured logs with Serilog:

class BaggageEnricher : ILogEventEnricher
{
    public void Enrich(LogEvent logEvent, 
        ILogEventPropertyFactory propertyFactory)
    {
        if (Activity.Current == null)
            return;

        foreach (var (key, value) in Activity.Current.Baggage)
        {
            logEvent.AddPropertyIfAbsent(propertyFactory.CreateProperty(key, value));
        }

    }
}

With baggage, we get a bit of overhead since it will piggyback everywhere. However, we can start to leverage our baggage to include contextual information that we find valuable in logs, traces, and metrics. Common data might be:

  • Business identifiers (cart ID etc)
  • Workflow/batch identifiers
  • Session identifiers
  • Machine information
  • User information

You'll have to, of course, follow privacy laws for some of this information (maybe don't log tax numbers), so care is needed here. In practice, it's been invaluable to take some business information (say, a cart ID), and see all traces related to it, not just information only available in a single trace.

By including high cardinality data in our logs and traces, we can far more quickly locate what we're looking for, instead of resorting to small time windows as I've had to in the past.