CDS: Retry remote front-end transactions on AskTimeoutException 61/24261/2
authorTom Pantelis <tpanteli@brocade.com>
Wed, 1 Jul 2015 20:28:43 +0000 (16:28 -0400)
committerGerrit Code Review <gerrit@opendaylight.org>
Mon, 20 Jul 2015 21:30:37 +0000 (21:30 +0000)
commit5c5c980e564d2b5f6cd26821ffd26997f59af260
treeb146c8bc7e6e088e51deeedc126338dcb0b0edd6
parentfed267bf1b8a9ea81d1ee7c9721962863b98e391
CDS: Retry remote front-end transactions on AskTimeoutException

With the front-end PrimaryShardInfo cache, if the cached primary/leader
shard is remote and unavailable, the RemoteTransactionContextSupport
will fail with an AskTimeoutException when it tries to send the
CreateTransaction message. Since it can take at least 1 election timeout
period to re-elect a new leader, I changed RemoteTransactionContextSupport
to also retry on AskTimeoutException (it already retries on
NoShardLeaderException). However instead of re-sending the
CreateTransaction message, as it did before, it now re-sends the
FindPrimary message to get a new primary shard actor.

I also modified how RemoteTransactionContextSupport retries. It will now
retry for a total period of 2 times the shard election timeout which
should be ample time for a re-election to occur. If no leader is found then
the txn will fail.

I also added a ShardLeaderNotRespondingException which the
RemoteTransactionContextSupport will throw if it ends up with an
AskTimeoutException after the tx creation timeout period. This shouldn't
occur normally as, with the retries, it should get a NoShardLeaderException
even if the initial error was AskTimeoutException. But it's possible to
end up with an AskTimeoutException, eg if the system is overloaded and
the election timeout is delayed.

During testing, I noticed that if you take down the 2 followers and try
a transaction, it fails with an AskTimeoutEx instead of
NoShardLeaderException as one would expect. This is b/c the leader
changes to an isolated leader. So I changed the Shardanager to return
NoShardLeaderException if the state is IsolatedLeader.

Change-Id: I3efd3f841cf41b7738aedb694fa18b44851b3074
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 845609758d1739ee07d5ca92f5448e18a8933861)
14 files changed:
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/ConcurrentDOMDataBroker.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/RemoteTransactionContextSupport.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/Shard.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/ShardManager.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/exceptions/NoShardLeaderException.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/exceptions/ShardLeaderNotRespondingException.java [new file with mode: 0644]
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/AbstractTransactionProxyTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/DistributedDataStoreIntegrationTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/DistributedDataStoreRemotingIntegrationTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/ShardManagerTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/TransactionChainProxyTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/TransactionProxyTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/resources/application.conf
opendaylight/md-sal/sal-distributed-datastore/src/test/resources/module-shards-member1-and-2-and-3.conf [new file with mode: 0644]