Leader Election In Raft

Raft is a consensus algorithm designed for understandability, the Raft paper is very accessible and easy to get through in a single reading. Raft has three components, leader election, log replication and safety, this article is about the first.

      stateDiagram
    [*] --> Follower
    Follower --> Candidate
    Candidate --> Leader
    Leader --> Follower
    Leader --> [*]

Peers in a Raft cluster can be in one of the following states; Follower, Candidate or Leader. All peers start of as followers and depending on circumstances which will be discussed later they can be promoted to Candidate and then Leader. A Raft cluster has a single leader. The diagram below shows the initial state of three peers; PeerA, PeerB and PeerC.

      classDiagram

    class PeerA {
        state: Follower
        votes: 0
        term: 0
        votedFor: -1
    }

    class PeerB {
        state: Follower
        votes: 0
        term: 0
        votedFor: -1
    }

    class PeerC {
        state: Follower
        votes: 0
        term: 0
        votedFor: -1
    }

Each peer has an election timer that goes off at randomized intervals. When the timer goes off the peer initiates an election by first promoting itself to Candidate, incrementing its term by 1 and voting for itself. It then start requesting for votes from other peers in the cluster. The randomized election timer avoids a situation where all peers start elections simultaneously leading to split brain.

      sequenceDiagram
    Note left of PeerA: Candiate

    PeerA -) PeerB: VoteRequest
    PeerA -) PeerC: VoteRequest
    PeerB --) PeerA: VoteRequestResponse
    PeerC --) PeerA: VoteRequestResponse

type VoteRequest struct {
    Term int
    CandidateId int
}

type VoteRequestResponse struct {
    Term int
    Granted bool
}

Peers only vote for a candidate if the candidate’s term property is equal to the peer’s term and the peer has not voted for another candidate in that term. If a peer receives a vote request from a candidate with an outdated term, they ignore the request. If enough peers have voted for the candidate, the candidate updates it state to Leader and starts sending heartbeats to other peers.

When a peer receives a heartbeat and they were in state other than Follower and the term in the request is greater than theirs, they demote themselves to Follower. They also reset their election timer, this ensures no peer starts an election when there’s a valid leader in place. Heartbeats are AppendEntry requests with an empty EntryLog. A peer ignores a heartbeat if the term in the request is less than theirs.

      sequenceDiagram
Note left of PeerA: Leader

    PeerA -) PeerB: AppendRequest (heartbeat)
    PeerA -) PeerC: AppendRequest (heartbeat)
    PeerB --) PeerA: AppendRequestResponse
    PeerC --) PeerA: AppendRequestResponse

type AppendEntry struct {
    Term int
    entry LogEntry[]
}

type AppendEntryResponse struct {
    Term int
    Success bool
}

When a Candidate becomes a Leader it stops its election timer, since there is no point of holding an election and starts sending heartbeats to peers to establish its dominance. Peers that receive the heartbeat reset their election timer preventing them from initiating elections.

When a Leader goes down or is disconnected from peers in the cluster, some peer will start an election and become the Leader. When the disconnected Leader comes back on, it will try sending heartbeats to the other peers but those will be ignored. The old leader will see that the term property of AppendEntryResponse is newer than theirs and consequently they will demote themselves to follower.

The rule of thumb is that whenever a peer gets either a request or response with a newer term, they must demote themselves to Follower. There’s also another restriction, a candidate can win an election if its log contains all commited entries.