Occam's razor Archive Pages Categories Tags

Raft Consensus in Go - Part 1: Elections and Terms

02 April 2021

Raft is a consensus algorithm designed to be simpler than Paxos. You have a cluster of servers that need to agree on a sequence of commands, even if some servers fail. One server becomes the leader, tells the others what to do, and if the leader dies, the rest elect a new one.

This post walks through building a Raft implementation in Go. We’ll start with the state management and election logic - the trickiest part of Raft is getting leaders elected correctly.

The full implementation is available at github.com/navgeet/raft.


The Core Struct

A Raft server needs to track a bunch of state:

type Raft struct {
    id string

    // Persistent state (must survive crashes)
    currentTerm uint64
    votedFor string
    log Log  // List of log entries

    // Volatile state
    state State  // Leader, Follower, or Candidate
    commitIndex uint64  // Highest log entry known to be committed
    lastApplied uint64  // Highest log entry applied to state machine

    // Leader state (only on leader)
    nextIndex map[string]uint64  // Next log index to send to each peer
    matchIndex map[string]uint64  // Highest log index replicated on each peer

    // Timing and networking
    lastContact time.Time
    peers map[string]Peer

    // Synchronization
    mu sync.Mutex
}

The key insight is splitting state into “persistent” and “volatile.” If the server crashes, currentTerm and votedFor must survive (written to disk). Everything else can be reconstructed.


Terms: Raft’s Clock

In Raft, time is divided into “terms.” Each term has at most one leader. Think of terms as logical clock ticks.

const defaultElectionTimeout = 150 * time.Millisecond
const defaultHeartbeatInterval = 50 * time.Millisecond

// A follower waits for a heartbeat from the leader.
// If none arrives within the election timeout, it becomes a candidate.
// The election timeout is randomized to avoid split votes:
electionTimeout := time.Duration(rand.Intn(150) + 150) * time.Millisecond

The timeout being randomized is crucial. If all servers had the same timeout, they’d all start elections at the same time and deadlock.


The Election Loop

Every server runs the election loop in a background goroutine:

func (r *Raft) electionLoop() {
    for r.state != Shutdown {
        r.mu.Lock()

        // If we're not getting heartbeats from the leader...
        timeSinceContact := time.Since(r.lastContact)

        // ...and we're a follower, start an election
        if r.state == Follower && timeSinceContact > electionTimeout {
            r.becomeCandidate()
        }

        r.mu.Unlock()

        time.Sleep(10 * time.Millisecond)
    }
}

func (r *Raft) becomeCandidate() {
    r.currentTerm++
    r.state = Candidate
    r.votedFor = r.id  // Vote for ourselves
    r.persist()  // Write to disk

    // Ask all peers for votes
    r.election()
}

When a follower’s election timeout fires, it increments its term and asks other servers for their votes. This is the moment of truth - will this server become the leader?


Requesting Votes

The candidate sends RequestVote RPCs to all peers:

type RequestVoteRequest struct {
    CandidateID string
    Term uint64
    LastLogIndex uint64
    LastLogTerm uint64
}

type RequestVoteResponse struct {
    Term uint64
    VoteGranted bool
}

Notice the LastLogIndex and LastLogTerm. A server won’t vote for a candidate unless that candidate’s log is at least as up-to-date as its own. This prevents electing a leader that doesn’t have all committed entries.

func (r *Raft) election() {
    request := RequestVoteRequest{
        CandidateID: r.id,
        Term: r.currentTerm,
        LastLogIndex: r.log.LastIndex(),
        LastLogTerm: r.log.LastTerm(),
    }

    votes := 1  // We vote for ourselves
    quorum := len(r.peers)/2 + 1

    for id, peer := range r.peers {
        if id == r.id {
            continue  // Don't vote ourselves
        }

        go func(peer Peer) {
            response, err := peer.RequestVote(request)
            if err != nil {
                return
            }

            r.mu.Lock()
            defer r.mu.Unlock()

            // Update term if peer has higher term
            if response.Term > r.currentTerm {
                r.currentTerm = response.Term
                r.state = Follower
                r.votedFor = ""
                r.persist()
                return
            }

            // Count the vote
            if response.VoteGranted && response.Term == r.currentTerm {
                votes++

                // If we have a quorum, we're the leader
                if votes >= quorum {
                    r.becomeLeader()
                }
            }
        }(peer)
    }
}

The trick is handling term updates. If a server gets a response with a higher term, it immediately steps down to follower. This prevents “split brain” where multiple servers think they’re the leader.


Handling Vote Requests

On the receiving end, a server needs to decide whether to grant a vote:

func (r *Raft) handleRequestVote(request RequestVoteRequest) RequestVoteResponse {
    r.mu.Lock()
    defer r.mu.Unlock()

    response := RequestVoteResponse{Term: r.currentTerm}

    // If request term is lower, reject
    if request.Term < r.currentTerm {
        return response
    }

    // If request term is higher, update ours and become follower
    if request.Term > r.currentTerm {
        r.currentTerm = request.Term
        r.votedFor = ""
        r.state = Follower
        r.persist()
    }

    // Check if we've already voted this term
    if r.votedFor != "" && r.votedFor != request.CandidateID {
        return response  // Already voted for someone else
    }

    // Check if candidate's log is up to date
    if request.LastLogTerm < r.log.LastTerm() {
        return response  // Candidate's log is behind
    }

    if request.LastLogTerm == r.log.LastTerm() &&
        request.LastLogIndex < r.log.LastIndex() {
        return response  // Same term but candidate's log is shorter
    }

    // Grant the vote
    r.votedFor = request.CandidateID
    r.persist()
    response.VoteGranted = true
    return response
}

The log comparison is critical. A server will only vote for a candidate if that candidate’s log is at least as up-to-date. This ensures that a leader has all previously committed entries.


Becoming Leader

Once a candidate gets a majority, it becomes leader:

func (r *Raft) becomeLeader() {
    r.state = Leader

    // Initialize nextIndex: for each peer, the index of the next log entry
    // to send to it. Start with our last log index + 1.
    for id := range r.peers {
        if id != r.id {
            r.nextIndex[id] = r.log.LastIndex() + 1
            r.matchIndex[id] = 0
        }
    }

    // Send initial empty AppendEntries as heartbeat
    // This announces the new leadership to followers
    r.heartbeat()
}

The nextIndex and matchIndex maps are crucial for log replication (next post). The leader tracks, for each follower, what log entries it has replicated.


The Heartbeat Loop

The leader needs to continuously send heartbeats to prevent other servers from timing out and starting elections:

func (r *Raft) heartbeatLoop() {
    ticker := time.NewTicker(50 * time.Millisecond)

    for range ticker.C {
        r.mu.Lock()

        // Only the leader sends heartbeats
        if r.state == Leader {
            r.heartbeat()
        }

        r.mu.Unlock()
    }
}

func (r *Raft) heartbeat() {
    for id, peer := range r.peers {
        if id == r.id {
            continue
        }

        go func(peer Peer) {
            // AppendEntries with no entries is a heartbeat
            request := AppendEntriesRequest{
                LeaderID: r.id,
                Term: r.currentTerm,
                LeaderCommit: r.commitIndex,
                // prevLogIndex, prevLogTerm, entries all empty for heartbeat
            }

            response, err := peer.AppendEntries(request)
            if err != nil {
                return
            }

            r.mu.Lock()
            defer r.mu.Unlock()

            // If peer has higher term, step down
            if response.Term > r.currentTerm {
                r.currentTerm = response.Term
                r.state = Follower
                r.persist()
            }
        }(peer)
    }
}

Heartbeats keep followers from timing out. If a leader crashes, heartbeats stop and followers start elections.


Resetting the Timeout

When a follower receives a heartbeat from the leader, it resets its election timeout:

func (r *Raft) handleAppendEntries(request AppendEntriesRequest) AppendEntriesResponse {
    r.mu.Lock()
    defer r.mu.Unlock()

    // Got a message from leader, reset timeout
    r.lastContact = time.Now()

    // ... rest of logic
}

This is simple but essential. The lastContact time is checked in the election loop to decide if it’s time to start an election.


Common Pitfalls

Race conditions on state transitions: Make sure all state reads/writes are protected by the mutex. It’s easy to read state without holding the lock and have it change mid-operation.

Term confusion: Always update your term if you see a higher one. If you don’t, you’ll have multiple “leaders” believing they’re in charge.

Split votes: If the election timeout isn’t randomized, all servers start elections simultaneously and deadlock.

Persistent state: If you forget to persist currentTerm and votedFor, a restarted server can vote twice in the same term, breaking the election guarantee.

The election logic handles the first challenge of distributed consensus: who gets to decide what happens. Once leaders are elected reliably, the next challenge is replicating decisions to all servers. That’s next.

blog comments powered by Disqus