Rework ClusterSingletonServiceGroupImpl locking 41/75541/2
authorRobert Varga <robert.varga@pantheon.tech>
Wed, 25 Jul 2018 12:16:29 +0000 (14:16 +0200)
committerRobert Varga <nite@hq.sk>
Tue, 4 Sep 2018 08:46:58 +0000 (08:46 +0000)
commite0db53ceec4f4e1524507b8d258a45db8909836b
tree2682044440afb51a2ceb5b6f69938c3cc13bf8a7
parent37543ee77d4629d253c41eeab536d474d3c2a680
Rework ClusterSingletonServiceGroupImpl locking

The problem we are are seeing is a classic AB/BA deadlock, hence
we need to change how serviceGroup is handled. This patch reworks
ClusterSingletonServiceGroupImpl to separate state tracking from
service startup/shutdown mechanics.

State locking is separated out into three domains:
- entity state, guarded by ClusterSingletonServiceGroupImpl object
- service membership, tracked in a ConcurrentMap
- service instantiation, guarded by a simple CAS-based lock

Furthermore anytime state changes, we mark this fact in a volatile
variable. Whenever we observe dirty state, we attempt to reconcile
it -- if we can also acquire the service instantiation lock.

Each registered service is tracked separately, so we do not have
to have wholesale aggregator futures for stopping services and can
also start newcomer services without causing weird state tracking
disruptions.

Splitting state tracking and service instantiation leads to faster
group shutdown, because when a group is being closed we know we can
unregister the service entity irrespective of the state of user
services. Unit tests, especially asynchronous, are updated to account
for this accelerated shutdown procedure.

This has the benefit of improving inter-node failover latency,
because the process of user service shutdown and service entity
unregistration runs concurrently. That leads to lower likelihood
of the new service entity owner having to block on becoming
the cleanup entity owner, as services which shut down quickly
will have released the cleanup entity by the time the new owner
is selected.

JIRA: MDSAL-362
Change-Id: I7cd82f81da9135591e4242a196cc0f06a78973a1
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/AbstractClusterSingletonServiceProviderImpl.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/ClusterSingletonServiceGroup.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/ClusterSingletonServiceGroupImpl.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/PlaceholderGroup.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/ServiceInfo.java [new file with mode: 0644]
singleton-service/mdsal-singleton-dom-impl/src/test/java/org/opendaylight/mdsal/singleton/dom/impl/ClusterSingletonServiceGroupImplTest.java
singleton-service/mdsal-singleton-dom-impl/src/test/java/org/opendaylight/mdsal/singleton/dom/impl/DOMClusterSingletonServiceProviderAsyncImplTest.java
singleton-service/mdsal-singleton-dom-impl/src/test/java/org/opendaylight/mdsal/singleton/dom/impl/DOMClusterSingletonServiceProviderImplTest.java