Raft

guest@rkulskis.github.io:~/projects$ echo LINKS: $(ls -a)
LINKS: . .. audio_2_led.org raft/README.org

Overview

Raft is a consensus algorithm, which is a distributed state machine that provides consistency in operations across multiple nodes. Consensus algorithms provide fault tolerance by guaranteeing progress and consistency in the face of server failures so long as some condition is upheld. In the case of raft, this condition is that the majority of servers are online. E.g. so long as 3 of our 5 servers that comprise the Raft stay online, we can still progress.

Each Raft server has 3 states:

  1. Follower: accept commands from leader.
  2. Candidate: run for election if no valid leader (from server POV).
  3. Leader: relay commands from user to rest of servers.

Raft primaryily uses the AppendEntries() and RequestVote() Remote Procedural Calls (RPCs - a fancy word for a function call that runs on another server). Because of the intricate and interwoven logic of state, most raft implementations use sempahores or locking. However, this implementation is fully asynchronous and requires no locks.

This version of raft uses ROS 2 and the same functional methodology as RT-DDS (will link paper when published). See the image below for how the servers are composed in a pub-sub manner so that much of the individual logic of e.g. responding to an RPC is handled at the unary unmarshall/marshall layer.

raft.svg

Structure

The function space is partitioned by arity into two groups:

  1. unary: takes one input
  2. n-ary: takes more than one input

Each function may have state associated with it, which is encapsulated in the class for simplicity and to code in a Object-Oriented manner since this is a Python implementation.

All data, which flows through the pub/sub graph, and state lives in separate files from our functions. This significantly decreases lines of code and allows for a more scalable design.

.
├── communication/ # handle transfer or data
├── data/
│   ├── append_entries_req.py       
│   ├── append_entries_resp.py       
│   ├── client_cmd_req.py       
│   ├── client_cmd_resp.py       
│   ├── entry.py       
│   ├── vote_req.py       
│   └── vote_resp.py       
├── functions/
│   ├── n_ary/ # i.e. if forall X, then do Y
│   │   ├── _candidate.py       
│   │   ├── _follower.py       
│   │   ├── _leader.py       
│   │   ├── client.py       
│   │   └── server.py       
│   └── unary/ # for handling messages; note the alphabetical name-grouping of RPC steps
│       ├── append_entries_req.py       
│       ├── append_entries_req_resp.py       
│       ├── append_entries_resp_recv.py       
│       ├── client_cmd_req.py       
│       ├── client_cmd_req_recv.py       
│       ├── client_cmd_resp.py       
│       ├── client_cmd_resp_recv.py       
│       ├── recv.py       
│       ├── send.py       
│       ├── vote_req.py       
│       ├── vote_req_resp.py       
│       └── vote_resp_recv.py       
└── state/ # server state
    ├── persistent.py       
    ├── raft_server_state.py       
    ├── state_machine.py       
    ├── timer.py       
    └── volatile.py       

Pseudocode

Given the clear decomposition of functions (unary and n-ary), state, data, and communication logic, each raft server operates with the (pseudo-code) following logic:

# Stage 1: unary unmarshall inputs
for _input in inputs:
    common_unmarshall(_input)    # Check term, update self.term if <
    if _input is response:
        resp_recv(response, state)

# Stage 2: n-ary        
n_ary = [follower, candidate, leader]

before = state.status
n_ary[before](state)
after = state.status

converted = (before != after)

if converted:
    n_ary[after](state)


# Stage 3: unary marshall subscribers (request or execute respond from their input)    
for subscriber, corresponding_input in zip(subscribers, inputs)
    # Ok to ignore request if e.g. new leader sending append entries to old leader
    if corresponding_input is request and should_respond(state, corresponing_input):
        req_resp(request, state)
    else:
        if timeout_allows:      # i.e. election_timeout, heartbeat_timeout
            req_send(subscriber, state)

Tests

This repo also has a CLI parser which takes in arguments or a config.json which can configure:

  • num_servers: int
  • scheduled_downtime: list[(server_index, start_ms, end_ms)]
  • scheduled_config_change: list[(change, time)]
  • scheduled_partition: list[(Set_A, set_B, ..., start_ms, end_ms)]
  • writes: list[(int, time_ms)] for when the client writes a value change to the raft