Bug 6540: EOS - Prune pending owner change commits on leader change 94/46194/2
authorTom Pantelis <tpanteli@brocade.com>
Thu, 8 Sep 2016 14:16:53 +0000 (10:16 -0400)
committerTom Pantelis <tpanteli@brocade.com>
Mon, 3 Oct 2016 19:36:17 +0000 (19:36 +0000)
commitcc9e36025f69fd006c117b169f0f40c1a751a1f6
treeb6358dbf4cb0f5eeea0072b8fe62220b9e188e11
parent69291624b6dbfedd126e9caff2bf2806f88e9dd8
Bug 6540: EOS - Prune pending owner change commits on leader change

When the shard leader is isolated, it attempts to re-assign ownership for down
peers. However, since it's isolated, it can't commit the modifications. If the
majority partition elects a new leader, when the partition is healed, the old
leader tries to forward the pending owner change commits to the new leader.
However this is problematic as the criteria used to determine the new owner is
stale and owner changes should only be committed by a valid leader. Since the
old leader is no longer the leader, it should not forward pending owner change
commits. However it still should forward local candidate change commits.

So I modified EntityOwnershipShardCommitCoordinator#onStateChange to iterate
the pending Modifications and remove WRITE modifications for the owner leaf
when the shard has transitioned to having a remote leader.

I also fixed an issue in EntityOwnershipShard#onCandidateRemoved that was
intermittently revealed by unit tests. Say candidate1 and candidate2 are
removed quickly for an entity and candidate1 is the current owner.
onCandidateRemoved is called for candidate1 and commits an update to write
candidate2 as the owner. If the write commit is still pending when
onCandidateRemoved is called for candidate2, the current owner will still
be candidate1 and the "message.getRemovedCandidate().equals(currentOwner)"
check will fail and thus the owner isn't cleared and candidate2 will remain
as owner. This results in a node being the owner w/o being in the candidate
list. (This patch may fix Bug 6672 as well)

A new testLeaderIsolation case was added to EntityOwnershipShardTest. Also I
reworked the tests and removed the use of the MockFollower and MockLeader
actors for consistency and also so the tests use the real EOS shard.

Change-Id: I5039b07d02f8571ee2d1affb0f364ea278641e91
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 07c96b0fa318b7bf559df4954f705d06a44f1354)
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/Shard.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/entityownership/EntityOwnershipShard.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/entityownership/EntityOwnershipShardCommitCoordinator.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/entityownership/AbstractEntityOwnershipTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/entityownership/EntityOwnershipShardTest.java