Transactional Event-Sourced Workflows

duality

Distributed Transactions (updating multiple transactional resources simultaneously) are among the hardest problems in Distributed Systems. A process updates the state of the component, then has to tell the rest of the system about it - and those two actions aren't atomic. The update lands, but the notification is lost, or the notification fires but the update is never committed. 

Let’s see how Event-Sourced Workflows can address the issue by introducing transactionality between updating a workflow's progress and updating the state of the command model (which inherently notifies the world of the change).
Firstly, we are going to define what Event Sourcing and Workflows are, and, finally, how they intersect.

What Event Sourcing is

Event Sourcing stores the full sequence of events that led to an entity's current state, rather than storing the state and overwriting it with every change. The event store (an append-only log) represents the source of truth, and the current state is just a fold over the events. Replay the events in order, and you arrive at where the entity is now. Nothing is overwritten, any past state can be reconstructed, and every state change is, by construction, an explicit, durable fact that other parts of the system can consume.

What a Workflow is

A Workflow expresses a long-running business process as code. It’s represented by a sequence of steps which may take seconds or weeks. It doesn’t matter. It matters that the workflow can survive crashes and continue (even in a different process or on a different machine). A durable workflow engine makes this reliable by checkpointing the progress (recording which steps have been completed and their return values). So that after a failure, it resumes from the last completed step rather than starting over or executing a step twice.

Where they meet

Look closely and the two are the same shape. A workflow checkpoint: "step N completed, here is its result" is an event in the life of that process, and a workflow's state is nothing more than the fold of the steps it has taken so far. A workflow, in other words, is an event-sourced command model (an aggregate): replay its step events and you have its exact position.

That observation has a practical consequence. If your workflow engine already persists progress as events, it doesn't need a private checkpoint store sitting beside your event store. The two can be one store. The workflow's own history and the domain events it emits can live in the same log.

One transaction, one truth

The benefit shows when a workflow step drives the command (write) side of the system. A step issues a command, the command updates the command model, and handling the command results in one or more domain events. At the same moment, the workflow has to record its own progress: an event marking that step has been executed, including its result.

If the workflow and the command model are event-sourced into the same store, all these events can be grouped in a single transaction. The events emitted by the command model and the step event that marks the workflow's progress are included in the same event transaction and commit together or not at all.

The transactionality between updating the command model and updating the workflow is a real treasure. There is no window in which the command model changed, but the workflow didn’t, and none in which the workflow advanced but the events marking the command model change are not committed. No outbox, no dual write, and no after-the-fact reconciliation are necessary. 

Comments

Popular posts from this blog

Dynamic Consistency Boundaries

Lock-free Exclusive Processing

Optimistic State Machine Execution