Rework ClusterSingletonServiceGroupImpl locking 69/75169/17
authorRobert Varga <robert.varga@pantheon.tech>
Wed, 25 Jul 2018 12:16:29 +0000 (14:16 +0200)
committerRobert Varga <robert.varga@pantheon.tech>
Mon, 20 Aug 2018 08:46:02 +0000 (10:46 +0200)
commit234918aa235a24e02d5cca3390fc684d7e037200
treebdf6fb6ed5d22c7f9c71cbad0c59d1e67ea38e29
parent0c9ba525f69d65bbf46ebc8b849a07a0f4b4f62b
Rework ClusterSingletonServiceGroupImpl locking

The problem we are are seeing is a classic AB/BA deadlock, hence
we need to change how serviceGroup is handled. This patch reworks
ClusterSingletonServiceGroupImpl to separate state tracking from
service startup/shutdown mechanics.

State locking is separated out into three domains:
- entity state, guarded by ClusterSingletonServiceGroupImpl object
- service membership, tracked in a ConcurrentMap
- service instantiation, guarded by a simple CAS-based lock

Furthermore anytime state changes, we mark this fact in a volatile
variable. Whenever we observe dirty state, we attempt to reconcile
it -- if we can also acquire the service instantiation lock.

Each registered service is tracked separately, so we do not have
to have wholesale aggregator futures for stopping services and can
also start newcomer services without causing weird state tracking
disruptions.

Splitting state tracking and service instantiation leads to faster
group shutdown, because when a group is being closed we know we can
unregister the service entity irrespective of the state of user
services. Unit tests, especially asynchronous, are updated to account
for this accelerated shutdown procedure.

This has the benefit of improving inter-node failover latency,
because the process of user service shutdown and service entity
unregistration runs concurrently. That leads to lower likelihood
of the new service entity owner having to block on becoming
the cleanup entity owner, as services which shut down quickly
will have released the cleanup entity by the time the new owner
is selected.

JIRA: MDSAL-362
Change-Id: I7cd82f81da9135591e4242a196cc0f06a78973a1
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/AbstractClusterSingletonServiceProviderImpl.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/ClusterSingletonServiceGroup.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/ClusterSingletonServiceGroupImpl.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/PlaceholderGroup.java
singleton-service/mdsal-singleton-dom-impl/src/main/java/org/opendaylight/mdsal/singleton/dom/impl/ServiceInfo.java [new file with mode: 0644]
singleton-service/mdsal-singleton-dom-impl/src/test/java/org/opendaylight/mdsal/singleton/dom/impl/ClusterSingletonServiceGroupImplTest.java
singleton-service/mdsal-singleton-dom-impl/src/test/java/org/opendaylight/mdsal/singleton/dom/impl/DOMClusterSingletonServiceProviderAsyncImplTest.java
singleton-service/mdsal-singleton-dom-impl/src/test/java/org/opendaylight/mdsal/singleton/dom/impl/DOMClusterSingletonServiceProviderImplTest.java