Raft is a consensus algorithm designed to be simpler than Paxos. You have a cluster of servers that need to agree on a sequence of commands, even if some servers fail. One server becomes the leader, tells the others what to do, and if the leader dies, the rest elect a new one.
This post walks through building a Raft implementation in Go. We’ll start with the state management and election logic - the trickiest part of Raft is getting leaders elected correctly.
The full implementation is available at github.com/navgeet/raft.
A Raft server needs to track a bunch of state:
type Raft struct {
id string
// Persistent state (must survive crashes)
currentTerm uint64
votedFor string
log Log // List of log entries
// Volatile state
state State // Leader, Follower, or Candidate
commitIndex uint64 // Highest log entry known to be committed
lastApplied uint64 // Highest log entry applied to state machine
// Leader state (only on leader)
nextIndex map[string]uint64 // Next log index to send to each peer
matchIndex map[string]uint64 // Highest log index replicated on each peer
// Timing and networking
lastContact time.Time
peers map[string]Peer
// Synchronization
mu sync.Mutex
}
The key insight is splitting state into “persistent” and “volatile.” If the server crashes, currentTerm and votedFor must survive (written to disk). Everything else can be reconstructed.
In Raft, time is divided into “terms.” Each term has at most one leader. Think of terms as logical clock ticks.
const defaultElectionTimeout = 150 * time.Millisecond
const defaultHeartbeatInterval = 50 * time.Millisecond
// A follower waits for a heartbeat from the leader.
// If none arrives within the election timeout, it becomes a candidate.
// The election timeout is randomized to avoid split votes:
electionTimeout := time.Duration(rand.Intn(150) + 150) * time.Millisecond
The timeout being randomized is crucial. If all servers had the same timeout, they’d all start elections at the same time and deadlock.
Every server runs the election loop in a background goroutine:
func (r *Raft) electionLoop() {
for r.state != Shutdown {
r.mu.Lock()
// If we're not getting heartbeats from the leader...
timeSinceContact := time.Since(r.lastContact)
// ...and we're a follower, start an election
if r.state == Follower && timeSinceContact > electionTimeout {
r.becomeCandidate()
}
r.mu.Unlock()
time.Sleep(10 * time.Millisecond)
}
}
func (r *Raft) becomeCandidate() {
r.currentTerm++
r.state = Candidate
r.votedFor = r.id // Vote for ourselves
r.persist() // Write to disk
// Ask all peers for votes
r.election()
}
When a follower’s election timeout fires, it increments its term and asks other servers for their votes. This is the moment of truth - will this server become the leader?
The candidate sends RequestVote RPCs to all peers:
type RequestVoteRequest struct {
CandidateID string
Term uint64
LastLogIndex uint64
LastLogTerm uint64
}
type RequestVoteResponse struct {
Term uint64
VoteGranted bool
}
Notice the LastLogIndex and LastLogTerm. A server won’t vote for a candidate unless that candidate’s log is at least as up-to-date as its own. This prevents electing a leader that doesn’t have all committed entries.
func (r *Raft) election() {
request := RequestVoteRequest{
CandidateID: r.id,
Term: r.currentTerm,
LastLogIndex: r.log.LastIndex(),
LastLogTerm: r.log.LastTerm(),
}
votes := 1 // We vote for ourselves
quorum := len(r.peers)/2 + 1
for id, peer := range r.peers {
if id == r.id {
continue // Don't vote ourselves
}
go func(peer Peer) {
response, err := peer.RequestVote(request)
if err != nil {
return
}
r.mu.Lock()
defer r.mu.Unlock()
// Update term if peer has higher term
if response.Term > r.currentTerm {
r.currentTerm = response.Term
r.state = Follower
r.votedFor = ""
r.persist()
return
}
// Count the vote
if response.VoteGranted && response.Term == r.currentTerm {
votes++
// If we have a quorum, we're the leader
if votes >= quorum {
r.becomeLeader()
}
}
}(peer)
}
}
The trick is handling term updates. If a server gets a response with a higher term, it immediately steps down to follower. This prevents “split brain” where multiple servers think they’re the leader.
On the receiving end, a server needs to decide whether to grant a vote:
func (r *Raft) handleRequestVote(request RequestVoteRequest) RequestVoteResponse {
r.mu.Lock()
defer r.mu.Unlock()
response := RequestVoteResponse{Term: r.currentTerm}
// If request term is lower, reject
if request.Term < r.currentTerm {
return response
}
// If request term is higher, update ours and become follower
if request.Term > r.currentTerm {
r.currentTerm = request.Term
r.votedFor = ""
r.state = Follower
r.persist()
}
// Check if we've already voted this term
if r.votedFor != "" && r.votedFor != request.CandidateID {
return response // Already voted for someone else
}
// Check if candidate's log is up to date
if request.LastLogTerm < r.log.LastTerm() {
return response // Candidate's log is behind
}
if request.LastLogTerm == r.log.LastTerm() &&
request.LastLogIndex < r.log.LastIndex() {
return response // Same term but candidate's log is shorter
}
// Grant the vote
r.votedFor = request.CandidateID
r.persist()
response.VoteGranted = true
return response
}
The log comparison is critical. A server will only vote for a candidate if that candidate’s log is at least as up-to-date. This ensures that a leader has all previously committed entries.
Once a candidate gets a majority, it becomes leader:
func (r *Raft) becomeLeader() {
r.state = Leader
// Initialize nextIndex: for each peer, the index of the next log entry
// to send to it. Start with our last log index + 1.
for id := range r.peers {
if id != r.id {
r.nextIndex[id] = r.log.LastIndex() + 1
r.matchIndex[id] = 0
}
}
// Send initial empty AppendEntries as heartbeat
// This announces the new leadership to followers
r.heartbeat()
}
The nextIndex and matchIndex maps are crucial for log replication (next post). The leader tracks, for each follower, what log entries it has replicated.
The leader needs to continuously send heartbeats to prevent other servers from timing out and starting elections:
func (r *Raft) heartbeatLoop() {
ticker := time.NewTicker(50 * time.Millisecond)
for range ticker.C {
r.mu.Lock()
// Only the leader sends heartbeats
if r.state == Leader {
r.heartbeat()
}
r.mu.Unlock()
}
}
func (r *Raft) heartbeat() {
for id, peer := range r.peers {
if id == r.id {
continue
}
go func(peer Peer) {
// AppendEntries with no entries is a heartbeat
request := AppendEntriesRequest{
LeaderID: r.id,
Term: r.currentTerm,
LeaderCommit: r.commitIndex,
// prevLogIndex, prevLogTerm, entries all empty for heartbeat
}
response, err := peer.AppendEntries(request)
if err != nil {
return
}
r.mu.Lock()
defer r.mu.Unlock()
// If peer has higher term, step down
if response.Term > r.currentTerm {
r.currentTerm = response.Term
r.state = Follower
r.persist()
}
}(peer)
}
}
Heartbeats keep followers from timing out. If a leader crashes, heartbeats stop and followers start elections.
When a follower receives a heartbeat from the leader, it resets its election timeout:
func (r *Raft) handleAppendEntries(request AppendEntriesRequest) AppendEntriesResponse {
r.mu.Lock()
defer r.mu.Unlock()
// Got a message from leader, reset timeout
r.lastContact = time.Now()
// ... rest of logic
}
This is simple but essential. The lastContact time is checked in the election loop to decide if it’s time to start an election.
Race conditions on state transitions: Make sure all state reads/writes are protected by the mutex. It’s easy to read state without holding the lock and have it change mid-operation.
Term confusion: Always update your term if you see a higher one. If you don’t, you’ll have multiple “leaders” believing they’re in charge.
Split votes: If the election timeout isn’t randomized, all servers start elections simultaneously and deadlock.
Persistent state: If you forget to persist currentTerm and votedFor, a restarted server can vote twice in the same term, breaking the election guarantee.
The election logic handles the first challenge of distributed consensus: who gets to decide what happens. Once leaders are elected reliably, the next challenge is replicating decisions to all servers. That’s next.