Building End-to-End Diagnostics: User-Defined Context with Correlation Context

Posts in this series:

Source Code

With a brief detour to push out some NuGet packages, I wanted to pick up with a common issue folks run into once they start implementing distributed tracing. And that problem is one of locality - trying to find one trace among many, the needle in the haystack. In the past, I've resorted to combing through logs, using timestamps as a very poor means of trying to diagnose an issue.

We can decorate our logs with context information to make searching easier, but when it comes to our traces, how can we triangulate something a user did with a workflow and the associated traces?

This is where the emerging Correlation Context standard comes in to play - the ability for application code to add arbitrary information to traces - and critically - have that information flow through subsequent spans.

With distributed trace spans, you can add information to a span so that it eventually shows up in a collector/exporter. This is possible today with the System.Diagnostics.Activity API through tags:

Activity.Current.AddTag("user.id", user.Id);

But that information does not flow through to any subsequent spans within a process, or to subsequent processes. It exists for a single Activity/span, then it's gone.

This is where Correlation Context comes in. Trace context spec defines a "tracestate" header for vendor-specific trace information to propagate, and Correlation Context allows application code to add application-specific trace information to propagate.

Propagating Correlation Context in NServiceBus

ASP.NET Core will automatically parse correlation context header information, and places this in Activity.Baggage:

string[] baggage = headers.GetCommaSeparatedValues(HeaderNames.CorrelationContext);
if (baggage.Length > 0)
{
    foreach (var item in baggage)
    {
        if (NameValueHeaderValue.TryParse(item, out var baggageItem))
        {
            activity.AddBaggage(baggageItem.Name.ToString(), HttpUtility.UrlDecode(baggageItem.Value.ToString()));
        }
    }
}

Baggage, unlike Tags, will flow through to child Activities. When an Activity starts, the parent Activity.Current will flow its ParentId through to the new Activity. When you access an Activity's Baggage, the implementation pulls the current baggage, its Parent's baggage, and every parent up the chain.

With NServiceBus, we want to also parse incoming baggage, and propagate outgoing baggage through the Correlation-Context header. And although this header is still in draft mode with the W3C, it already has implementation support in ASP.NET Core 3.0.

To parse incoming headers, we can do the same operation that ASP.NET Core does, in our code that parses the incoming traceparent, we can look for the Correlation-Context header and place the values in Activity.Baggage:

if (context.MessageHeaders.TryGetValue(Headers.CorrelationContextHeaderName, out var correlationContext))
{
    var baggage = correlationContext.Split(',');
    if (baggage.Length > 0)
    {
        foreach (var item in baggage)
        {
            if (NameValueHeaderValue.TryParse(item, out var baggageItem))
            {
                activity.AddBaggage(baggageItem.Name, HttpUtility.UrlDecode(baggageItem.Value));
            }
        }
    }
}

Now that we have correlation context in our Baggage, any other child activities will have this baggage, too.

The last piece is propagation, in our original outgoing NServiceBus behavior that propagates traceparent:

if (!context.Headers.ContainsKey(Headers.CorrelationContextHeaderName))
{
    var baggageItems = activity.Baggage.Select(item => $"{item.Key}={item.Value}");
    var headerValue = string.Join(",", baggageItems);
    if (!string.IsNullOrEmpty(headerValue))
    {
        context.Headers[Headers.CorrelationContextHeaderName] = headerValue;
    }
}

Will now propagate the baggage out through the Correlation-Context header. With the incoming and outgoing header behaviors in place, any service can drop some data into baggage and have it propagate to all downstream services.

So all done, right? Well, not quite, as even though we added information to the Activity.Baggage, it doesn't necessarily mean that those values get exported to our tracing tools. Unfortunately today, OpenTelemetry exporters only consider the Tags portion of an Activity for exporting. This will be opt-in in the future, but for now, we'll need to manually copy our Baggage to Tags during the export process (or in child activities).

In the next post, we'll walk through exactly that - creating a "breadcrumb" Activity that piggybacks on events of other activities.

To see how to pass baggage to tags, check out this post.