BUG-8515: make sure we retry connection on NotLeaderException 51/57951/1
authorRobert Varga <robert.varga@pantheon.tech>
Mon, 29 May 2017 08:40:06 +0000 (10:40 +0200)
committerRobert Varga <robert.varga@pantheon.tech>
Mon, 29 May 2017 08:57:29 +0000 (10:57 +0200)
There is a race window when we are establishing connection to the

When we received the pointer to shard leader, we send a connect
request, but during that time window the leader may move, resulting
in a NotLeaderException response to ConnectClientRequest. Since
we are in reconnection mode, this will result in hard abort of

Fix this by wrapping NotLeaderException and akka failures in a
TimeoutException -- hence we will retry connecting.

Change-Id: Ia5d1915d59e80a70c54302c1790121d0767ff08a
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

index 6b221da..a1ddcc3 100644 (file)
@@ -24,6 +24,7 @@ import org.opendaylight.controller.cluster.access.ABIVersion;
 import org.opendaylight.controller.cluster.access.client.BackendInfoResolver;
 import org.opendaylight.controller.cluster.access.commands.ConnectClientRequest;
 import org.opendaylight.controller.cluster.access.commands.ConnectClientSuccess;
+import org.opendaylight.controller.cluster.access.commands.NotLeaderException;
 import org.opendaylight.controller.cluster.access.concepts.ClientIdentifier;
 import org.opendaylight.controller.cluster.access.concepts.RequestFailure;
 import org.opendaylight.controller.cluster.common.actor.ExplicitAsk;
@@ -137,14 +138,16 @@ abstract class AbstractShardBackendResolver extends BackendInfoResolver<ShardBac
         FutureConverters.toJava(ExplicitAsk.ask(info.getPrimaryShardActor(), connectFunction, CONNECT_TIMEOUT))
             .whenComplete((response, failure) -> {
                 if (failure != null) {
-                    LOG.debug("Connect attempt to {} failed", shardName, failure);
-                    future.completeExceptionally(failure);
+                    LOG.debug("Connect attempt to {} failed, will retry", shardName, failure);
+                    future.completeExceptionally(wrap("Connection attempt failed", failure));
                 if (response instanceof RequestFailure) {
                     final Throwable cause = ((RequestFailure<?, ?>) response).getCause().unwrap();
                     LOG.debug("Connect attempt to {} failed to process", shardName, cause);
-                    future.completeExceptionally(cause);
+                    final Throwable result = cause instanceof NotLeaderException
+                            ? wrap("Leader moved during establishment", cause) : cause;
+                    future.completeExceptionally(result);

©2013 OpenDaylight, A Linux Foundation Collaborative Project. All Rights Reserved.
OpenDaylight is a registered trademark of The OpenDaylight Project, Inc.
Linux Foundation and OpenDaylight are registered trademarks of the Linux Foundation.
Linux is a registered trademark of Linus Torvalds.