Improve leader election convergence 69/42969/3
authorTom Pantelis <tpanteli@brocade.com>
Tue, 2 Aug 2016 02:23:33 +0000 (22:23 -0400)
committerTom Pantelis <tpanteli@brocade.com>
Wed, 3 Aug 2016 04:02:06 +0000 (04:02 +0000)
When 2 nodes startup with the first node's log behind the second node's,
it usually takes several election rounds to converge - I've seen
anywhere from 40 s to 3 min, depending on timing. What happens is that
the first node goes to Candidate first but it's RequestVote is rejected
by the seconds node. Shortly after the seconds node goes to Candidate -
the term is higher than the first which causes the first node to go back
to Follower. However it doesn't respond to the RequestVote. Then the
first node goes to Candidate and the cycle repeats. Eventually, due to
the election variance, the seconds node times out first and the second
node process the RequestVote and grants it. But it can take more than 10
cycles.

We can improve the convergence by allowing a Candidate to process and
respond to RequestVote when the sender's term is greater. It still
transitions to Follower as per the raft rules. The raft paper does not
say whether or not a Candidate can/should process a RequestVote in this
case but it seems to make sense. With this change, the first RequestVote
sent by the second node is granted and it converges quickly.

Change-Id: If9416ddf7bf0dfc1220a169be4174f440626a0dd
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/behaviors/Candidate.java
opendaylight/md-sal/sal-akka-raft/src/test/java/org/opendaylight/controller/cluster/raft/behaviors/CandidateTest.java

index 4d51922..1205c4b 100644 (file)
@@ -157,6 +157,12 @@ public class Candidate extends AbstractRaftActorBehavior {
             if (rpc.getTerm() > context.getTermInformation().getCurrentTerm()) {
                 context.getTermInformation().updateAndPersist(rpc.getTerm(), null);
 
+                // The raft paper does not say whether or not a Candidate can/should process a RequestVote in
+                // this case but doing so gains quicker convergence when the sender's log is more up-to-date.
+                if (message instanceof RequestVote) {
+                    super.handleMessage(sender, message);
+                }
+
                 return internalSwitchBehavior(RaftState.Follower);
             }
         }
index 6e5c931..40080a8 100644 (file)
@@ -313,7 +313,7 @@ public class CandidateTest extends AbstractRaftActorBehaviorTest<Candidate> {
         context.getTermInformation().update(2, "test");
 
         // Send an unknown message so that the state of the RaftActor remains unchanged
-        RaftActorBehavior expected = behavior.handleMessage(candidateActor, "unknown");
+        behavior.handleMessage(candidateActor, "unknown");
 
         RaftActorBehavior raftBehavior = behavior.handleMessage(candidateActor, appendEntries);
 
@@ -333,6 +333,7 @@ public class CandidateTest extends AbstractRaftActorBehaviorTest<Candidate> {
         return new MockRaftActorContext("candidate", getSystem(), candidateActor);
     }
 
+    @SuppressWarnings("unchecked")
     private Map<String, String> setupPeers(final int count) {
         Map<String, String> peerMap = new HashMap<>();
         peerActors = new TestActorRef[count];
@@ -349,6 +350,10 @@ public class CandidateTest extends AbstractRaftActorBehaviorTest<Candidate> {
     protected void assertStateChangesToFollowerWhenRaftRPCHasNewerTerm(final MockRaftActorContext actorContext,
             final ActorRef actorRef, final RaftRPC rpc) throws Exception {
         super.assertStateChangesToFollowerWhenRaftRPCHasNewerTerm(actorContext, actorRef, rpc);
-        assertEquals("New votedFor", null, actorContext.getTermInformation().getVotedFor());
+        if(rpc instanceof RequestVote) {
+            assertEquals("New votedFor", ((RequestVote)rpc).getCandidateId(), actorContext.getTermInformation().getVotedFor());
+        } else {
+            assertEquals("New votedFor", null, actorContext.getTermInformation().getVotedFor());
+        }
     }
 }

©2013 OpenDaylight, A Linux Foundation Collaborative Project. All Rights Reserved.
OpenDaylight is a registered trademark of The OpenDaylight Project, Inc.
Linux Foundation and OpenDaylight are registered trademarks of the Linux Foundation.
Linux is a registered trademark of Linus Torvalds.