Bug 6540: EOS - Prune pending owner change commits on leader change 16/45516/4
authorTom Pantelis <tpanteli@brocade.com>
Thu, 8 Sep 2016 14:16:53 +0000 (10:16 -0400)
committerTom Pantelis <tpanteli@brocade.com>
Mon, 26 Sep 2016 03:43:24 +0000 (23:43 -0400)
commit07c96b0fa318b7bf559df4954f705d06a44f1354
tree01e2cb28b32cb47087cab3b1ccf74dcefc700265
parent74524984b8e8625f6b8e8c791c584844d49ccf45
Bug 6540: EOS - Prune pending owner change commits on leader change

When the shard leader is isolated, it attempts to re-assign ownership for down
peers. However, since it's isolated, it can't commit the modifications. If the
majority partition elects a new leader, when the partition is healed, the old
leader tries to forward the pending owner change commits to the new leader.
However this is problematic as the criteria used to determine the new owner is
stale and owner changes should only be committed by a valid leader. Since the
old leader is no longer the leader, it should not forward pending owner change
commits. However it still should forward local candidate change commits.

So I modified EntityOwnershipShardCommitCoordinator#onStateChange to iterate
the pending Modifications and remove WRITE modifications for the owner leaf
when the shard has transitioned to having a remote leader.

I also fixed an issue in EntityOwnershipShard#onCandidateRemoved that was
intermittently revealed by unit tests. Say candidate1 and candidate2 are
removed quickly for an entity and candidate1 is the current owner.
onCandidateRemoved is called for candidate1 and commits an update to write
candidate2 as the owner. If the write commit is still pending when
onCandidateRemoved is called for candidate2, the current owner will still
be candidate1 and the "message.getRemovedCandidate().equals(currentOwner)"
check will fail and thus the owner isn't cleared and candidate2 will remain
as owner. This results in a node being the owner w/o being in the candidate
list. (This patch may fix Bug 6672 as well)

A new testLeaderIsolation case was added to EntityOwnershipShardTest. Also I
reworked the tests and removed the use of the MockFollower and MockLeader
actors for consistency and also so the tests use the real EOS shard.

Change-Id: I5039b07d02f8571ee2d1affb0f364ea278641e91
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/Shard.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/entityownership/EntityOwnershipShard.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/entityownership/EntityOwnershipShardCommitCoordinator.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/entityownership/AbstractEntityOwnershipTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/entityownership/EntityOwnershipShardTest.java