CDS: Retry remote front-end transactions on AskTimeoutException 51/23651/13
authorTom Pantelis <tpanteli@brocade.com>
Wed, 1 Jul 2015 20:28:43 +0000 (16:28 -0400)
committerGerrit Code Review <gerrit@opendaylight.org>
Fri, 17 Jul 2015 18:25:54 +0000 (18:25 +0000)
commit845609758d1739ee07d5ca92f5448e18a8933861
tree850b3b209096b7399b4e683498fe6ee25fd6023a
parent7a992a25fa34c6ce4a1d52919f7aa71dd1d20060
CDS: Retry remote front-end transactions on AskTimeoutException

With the front-end PrimaryShardInfo cache, if the cached primary/leader
shard is remote and unavailable, the RemoteTransactionContextSupport
will fail with an AskTimeoutException when it tries to send the
CreateTransaction message. Since it can take at least 1 election timeout
period to re-elect a new leader, I changed RemoteTransactionContextSupport
to also retry on AskTimeoutException (it already retries on
NoShardLeaderException). However instead of re-sending the
CreateTransaction message, as it did before, it now re-sends the
FindPrimary message to get a new primary shard actor.

I also modified how RemoteTransactionContextSupport retries. It will now
retry for a total period of 2 times the shard election timeout which
should be ample time for a re-election to occur. If no leader is found then
the txn will fail.

I also added a ShardLeaderNotRespondingException which the
RemoteTransactionContextSupport will throw if it ends up with an
AskTimeoutException after the tx creation timeout period. This shouldn't
occur normally as, with the retries, it should get a NoShardLeaderException
even if the initial error was AskTimeoutException. But it's possible to
end up with an AskTimeoutException, eg if the system is overloaded and
the election timeout is delayed.

During testing, I noticed that if you take down the 2 followers and try
a transaction, it fails with an AskTimeoutEx instead of
NoShardLeaderException as one would expect. This is b/c the leader
changes to an isolated leader. So I changed the Shardanager to return
NoShardLeaderException if the state is IsolatedLeader.

Change-Id: I3efd3f841cf41b7738aedb694fa18b44851b3074
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
14 files changed:
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/ConcurrentDOMDataBroker.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/RemoteTransactionContextSupport.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/Shard.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/ShardManager.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/exceptions/NoShardLeaderException.java
opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/exceptions/ShardLeaderNotRespondingException.java [new file with mode: 0644]
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/AbstractTransactionProxyTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/DistributedDataStoreIntegrationTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/DistributedDataStoreRemotingIntegrationTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/ShardManagerTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/TransactionChainProxyTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/java/org/opendaylight/controller/cluster/datastore/TransactionProxyTest.java
opendaylight/md-sal/sal-distributed-datastore/src/test/resources/application.conf
opendaylight/md-sal/sal-distributed-datastore/src/test/resources/module-shards-member1-and-2-and-3.conf [new file with mode: 0644]