Bug 2187: EOS shard recovery after AddShardReplica
On restart after an EOS shard replica is added and persisted, the
ShardManager recovers its snapshot and attempts to add the local member
to the shard replicas in the configuration. However, since there's no
static module conguration for the EOS shard, the ShardManager can't
create the shard on recovery complete. The shard does get created on
the subsequent CreateShard message however, if there's no local shards
in the static configuration, it creates the shard as inactive, ie with
the DisableElectionsRaftPolicy which we don't want.
To alleviate this, the ShardManager now stores its recovered snapshot
and, on CreateShard, if the shard was in the recovered shard list then
it was pre-existing so is not initialized with the
DisableElectionsRaftPolicy.
I extended
DistributedEntityOwnershipIntegrationTest::testEntityOwnershipShardBootstrapping
to restart the newly created replica and verify it's re-instated
properly. I added the customRaftPolicyClassName to the OnDemandRaftState
so the test can verify.
Testing revealed some timing issues in the EntityOwnershipShard on
re-instatement where pending modifications weren't sent to the leader.
The EntityOwnershipShard does respond to raft behavior state changes to send
pending modifications but, on startup, if the shard stays in the
follower state then no behavior change occurs. In that case the leaderId
changes and onLeaderChanged occurs so I changed it to also notify the
commit coordinator to commit the next batched transaction, if any. I
also did the same for onPeerUp since, in some test runs, the MemberUp
event hadn't occured yet.
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Change-Id: Id6bf966e0aa9a0f12f30327c617cb84f10e6b10f