Refactoring Towards Resilience: Evaluating RabbitMQ Options

Other posts in this series:

In the last post, we looked at dealing with an API in SendGrid that basically only allows at-most-once calls. We can't undo anything, and we can't retry anything. We're going to find some similar issues with RabbitMQ (although it's not much different than other messaging systems).

RabbitMQ, like all queuing systems I can think of, offer a wide variety of reliability modes. In general, I try to make my message handlers idempotent, as it enables so many more options up stream. I also don't really trust anyone sending me messages so anything I can do to ensure MY system stays consistent despite what I might get sent is in my best interest.

Looking back at our original code:

public async Task<ActionResult> ProcessPayment(CartModel model) {
    var customer = await dbContext.Customers.FindAsync(model.CustomerId);
    var order = await CreateOrder(customer, model);
    var payment = await stripeService.PostPaymentAsync(order);
    await sendGridService.SendPaymentSuccessEmailAsync(order);
    await bus.Publish(new OrderCreatedEvent { Id = order.Id });
    return RedirectToAction("Success");
}

We can see that if anything fails after the "bus.Publish" line, we don't really know what happened to our message. Did it get sent? Did it not? It's hard to tell, but going to our picture of our transaction model:

Transaction flow

And our options we have to consider as a reminder:

Coordination Options

Let's take a look at our options dealing with failures.

Ignore

Similar to our SendGrid solution, we could just ignore any failures with connecting to our broker:

public async Task<ActionResult> ProcessPayment(CartModel model) {  
    var customer = await dbContext.Customers.FindAsync(model.CustomerId);
    var order = await CreateOrder(customer, model);
    var payment = await stripeService.PostPaymentAsync(order);
    await sendGridService.SendPaymentSuccessEmailAsync(order);
    try {
        await bus.Publish(new OrderCreatedEvent { Id = order.Id });
    } catch (Exception e) {
        Logger.Exception(e, $"Failed to send order created event for order {order.Id}");
    }
    return RedirectToAction("Success");
}

This approach would shield us from connectivity failures with RabbitMQ, but we'd still need some sort of process to detect these failures and retry those sends later on. One way to do this would be simply to flag our orders:

} catch (Exception e) {
    order.NeedsOrderCreatedEventRaised = true;
    Logger.Exception(e, $"Failed to send order created event for order {order.Id}");
}    

It's not a very elegant solution, as I'd have to create flags for every single kind of message I send. Additionally, it ignores the issue of a database transaction rolling back, but my message is still sent. In that case, my message will still get sent, and consumers could get events for things that didn't actually happen! There are other ways to fix this - but for now, let's cover our other options.

Retry

Retries are interesting in RabbitMQ because although it's fairly easy to retry my message on my side, there's no guarantee that consumers can support a message if it came in twice. However, in my applications, I try as much as possible to make my message consumers idempotent. It makes life so much easier, and allows so many more options, if I can retry my message.

Since my original message includes the unique order ID, a natural correlation identifier, consumers can have an easy way of ensuring their operations are idempotent as well.

The mechanics of a retry could be similar to our above example - mark the order as needing a retry of the event to be raised at some later point in time, or retry in the same block, or include a resiliency layer on top of sending.

Undo

RabbitMQ doesn't support any sort of "undo" natively, so if we wanted to do this ourselves, we'd have to have some sort of compensating event published. Perhaps an event, "OrderNotActuallyCreatedJustKiddingAboutBefore"?

Perhaps not.

Coordinate

RabbitMQ does not natively support any sort of two-phase commit, so coordination is out.

Next steps

Now that we've examined all of our options around the various services our application integrates with, I want to evaluate each service in terms of the coupling we have today, and determine if we truly need that level of coupling.