Log Replication In Raft

The log is an important component and the primary means for replicating state in Raft based distributed systems. In this article I will explore how the log is used to propergate the leader’s state to the followers and how the leader replicates its log to followers. Log replication essentially enables the fault tolerance attribute of a Raft distributed system.

To appreciate how log replication works, let us assume we are building a distributed key-value store using a simple map. Clients will send commands to the leader to update the store and the leader’s responsibility is to make sure that followers have a replica of its store so that in the event of failure another follower can pick up and the system continues to work as if no failure happened, this is what fault tolerance means.

state := map[string]int{}

Clients can send commands like SET("a", 5) which the leader adds to their log. The log is a slice of LogEntry objects, which have two properties term and command. We have seen what the command is, the term denotes the term in which the command was received. The term of a LogEntry is important during log replication because followers use it to know when a LogEntry is outdated and should therefore be rejected.

type LogEntry struct {
    Term
    Command
}


logs := []LogEntry{}

Let us assume that the client has sent a few commands to the leader and the log is as shown in the snippet below. The leader doesn’t immeadiately apply the commands to it’s state, it first sends AppendEntry requests to its followers and when an entry has been sufficiently replicated, ie to majority of the followers then that entry is committed. The leader keeps track of the index that is known to have been replicated for each of the followers in matchIndex data structure.

The snippet below shows that only the first log entry has been replicated to followerA and followerB. And shows the leader’s matchIndex data structure. Since log entry 1 has been replicated to the majority, we consider the log entry at 1 committed and set commitIndex to 1. No command from the client has been applied to the state yet, therefore, the lastAppliedIndex variable is set to 0.


// leader
state := map[string]int{}
logs := []LogEntry{{1; "noop"},{1; "set(a, 5)"},{1; "set(b, 6)"},{1; "set(c, 7)"}}
matchIndex := map[string]int {
    "followerA": 1,
    "followerB": 1
}
commitIndex := 1
lastAppliedIndex := 0

// followerA
logs := []LogEntry{{1; "noop"},{1; "set(a, 5)"}}

// followerB
logs := []LogEntry{{1; "noop"},{1; "set(a, 5)"}}

In my implementation, each node in the cluster independently runs an applier go routine which loops from lastAppliedIdx + 1 to the commitIndex, applying all the commands to the node’s state. The leader shares with the follower the commitIndex through AppendEntry requests. The state after successful command application is as shown below. This is how the leader’s state is replicated to all followers in Raft.


// leader
state := map[string]int{
    "a": 5
}

// followerA
state := map[string]int{
    "a": 5
}

// followerB
state := map[string]int{
    "a": 5
}

It is not all always the case that followers accept AppendEntry requests, followers can reject them if they are outdated ie from an outdated leader or for some other reasons.

The leader maintains a nextIndex variable for each of the followers and whenever an entry is rejected it reduces the follower’s nextIndex by 1 and retries the request. Eventually the AppendEntry request will be accepted.

nextIndex := map[string]int {
    "followerA": 2,
    "followerB": 2
}