Optimistic State Machine Execution


Distributed Consensus algorithms play an essential role in any Distributed System in which a cluster of nodes must agree on a single value - usually an order of user requests. Once user requests are replicated and uniquely ordered on each node (in a replication log), a node can apply the agreed-upon request to its state machine. If each node has the same implementation of a deterministic state machine, the state on all nodes will be the same (up to the given user request). Voila, we have achieved redundancy in our system, and if some of the nodes fail, we still have the state available on other nodes for users to consume. 

State Machine Replication

Idempotency

If a node crashes and restarts, it wouldn't be optimal to have the apply process start applying requests from the beginning of time. Hence, the apply process durably stores its progress in an apply_index variable. This saves us some CPU cycles.
However, a node could crash during the apply process before we have durably stored the apply_index. In this case, the state machine might execute the same request more than once. 
It is crucial that the state machine is idempotent and that re-executing the same request does not hurt its consistency and, after all, its determinism.

A convenient way of achieving state machine idempotency in a Distributed Consensus environment is to deduplicate requests from the replication log. Replication log entries are uniquely identified by their replication_log_index. Along with the request, the apply process gives the replication_log_index to the state machine. The state machine durably stores this value as its deduplication key. It uses it to identify requests it has already processed - "I don't process the request if its deduplication key is smaller or equal to the last one I have seen."

Pessimistic execution

The premise I want to take here is that replicating requests usually does not fail. I said this and stayed alive. In a pessimistic scenario, executing the request would first involve replicating it across the cluster of nodes and invoking the apply process once it has been replicatedThe apply process takes the request and gives it to the state machine. The state machine executes it, and the apply_index gets updated. We are now free to move forward with the subsequent request.

Pessimistic Execution

The time necessary for the request to be processed is the sum of the time needed for the replication and the time it takes the state machine to execute the request. We can do better...

Optimistic execution

If the replication is usually successful, why wait for it to complete before we apply the request to the state machine? Well, it is not necessary to wait. We could start executing the request against the state machine while it is being replicated. Once both processes are done, we can update the apply_index.

Optimistic Execution

Now, the time necessary for processing the request is the maximum between the time needed to replicate the request and the time it takes the state machine to execute it.

Of course, this optimization is only possible if the state machine is capable of reverting the execution of requests that failed to replicate. While I'm saying this as a side note and not-a-big-deal thing, it is actually quite crucial for the state machine to have this property if we want to guarantee consistency and determinism.

In the end, a little optimism wouldn't hurt :)

Comments

Popular posts from this blog

Lock-free Exclusive Processing

Ordering in Event-Sourced Systems

Myths about Distributed Business Transactions