Bug 2187: AddServer unit test and bug fixes 63/28663/3
authorTom Pantelis <tpanteli@brocade.com>
Wed, 21 Oct 2015 09:02:17 +0000 (05:02 -0400)
committerGerrit Code Review <gerrit@opendaylight.org>
Fri, 23 Oct 2015 17:26:51 +0000 (17:26 +0000)
commit3fda1a923defdbf18849c6080c3aa19f1ebf2c5f
tree5bd9f5aca38343e5ca4f4733b1515038ad668c61
parentb34452ce75563e360ae1d02a9f2aa6223d6208c3
Bug 2187: AddServer unit test and bug fixes

Follow-up patch to https://git.opendaylight.org/gerrit/#/c/28018/.

Got the unit tests working and added more unit tests to cover more code.

Also fixed several bugs in the code that were failing the tests. One bug
was caused by replicating data quickly after install snapshot was
complete. On the final install snapshot chunk the follower sends an
ApplySnaphot message to persist and apply the snapshot. On the reply,
the leader assumes the follower is up-to-date and sets its next index.
However, applying the snapshot, ie updating the log and commit index, is
actually done after the async callback from the snapshot persist. In between
that time, if the leader sends the server config AppendEntries, the follower's
log is still empty and it deems itself out-of-sync and reports back failure.
This will cause the leader to eventually send a new install snaphot
which isn't which is not desirable. Also it may delay consensus for the
server config entry.

To fix this, I delayed the final InstallSnapshotReply until after the
ApplySnapshot is complete. I did this by adding a Callback to the
ApplySnapshot message which the SnapshotManager invokes.

Also the new server config was constructed without the leader's ID - it
needs to contain all members.

Also the ServerConfigurationPayload wasn't being applied in the
followers.

Another issue was that, if the leader had no peers initially, the
heartbeat wasn't scheduled so, when the new server was added, heartbeats
weren't occurring. So I change addFollower to schedule the heartbeat.

I added a test for adding a non-voting server which caused an endless
loop in AbstractLeader#handleAppendEntriesReply where it updates the
commitIndex based on the replicated count. To fix this, I added a break
if the replicatedLogEntry is null.

Change-Id: I5dff351140c611d58357cd58900bed401606038c
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
13 files changed:
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/RaftActor.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/RaftActorServerConfigurationSupport.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/RaftActorSnapshotMessageSupport.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/ServerConfigurationPayload.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/SnapshotManager.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/SnapshotState.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/base/messages/ApplySnapshot.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/behaviors/AbstractLeader.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/behaviors/AbstractRaftActorBehavior.java
opendaylight/md-sal/sal-akka-raft/src/main/java/org/opendaylight/controller/cluster/raft/behaviors/Follower.java
opendaylight/md-sal/sal-akka-raft/src/test/java/org/opendaylight/controller/cluster/raft/RaftActorServerConfigurationSupportTest.java
opendaylight/md-sal/sal-akka-raft/src/test/java/org/opendaylight/controller/cluster/raft/RaftActorSnapshotMessageSupportTest.java
opendaylight/md-sal/sal-akka-raft/src/test/java/org/opendaylight/controller/cluster/raft/behaviors/FollowerTest.java