Bug 5504: Add PreLeader raft state
The following scenario can result in a "store tree and candidate base
differ" IllegalStateException on commit:
A follower receives a replicate and adds it to the log, say at index 1,
but the leader transfers or dies before committing and applying it to the
state. The follower becomes leader and when the next tx is applied, log
index 2, it has to first apply all log entries from the previous term that
hadn't been committed yet, in this case index 1. Since we got consensus for
index 2 that means index 1 has also been replicated to a majority. Therefore
ApplyState is sent for index 1 and then index 2. However index 1 is applied
as a "foreign" candidate while index 2 is in the pre-commit state. When
index 2 is applied the commit fails.
To prevent this scenario, we introduce a new raft state, PreLeader,
which is transitioned to from Candidate if there are uncommitted
entries, ie commit index < last log index. The PreLeader state performs all
the duties of Leader with the added behavior of attempting to commit all
uncommitted entries from the previous leader's term. Raft does not allow a
leader to commit entries from a previous term by simply counting replicas -
only entries from the leader's current term can be committed (ยง5.4.2). Rather
then waiting for a client interaction to commit a new entry, the PreLeader
state immediately appends a no-op entry (NoopPayload) to the log with the
leader's current term. Once the no-op entry is committed, all prior entries
are committed indirectly. Once all entries are committed, ie commitIndex matches
the last log index, it switches to the normal Leader state.
The PreLeader state is considered an inactive leader state and thus
client transactions are delayed until it transitions to Leader.
Change-Id: I20a541de0eba9b0075b9952dc6d5808943b7bb8f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
18 files changed: