git.opendaylight Code Review - controller.git/log

BUG-8618: update sync status only after processing

Since the commitIndex may move in chunks we really want to update
our sync status after we have gone through the AppendEntries message
so our commitIndex reflects the state after processing.

Change-Id: I49c72a21f8d9c3efb7ae9cc1b64276220057f2e2
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit bdb818fbfc5f015ab14883348f170cca8ce79128)

BUG-8618: make sync threshold tuneable

We are observing quite a few of these transitions, which may be coming
from batching scenarios. Introduce sync-index-threshold config knob
to expose control over it.

Change-Id: Ief4c89c2fe5b95cebaf3fb83cbcdda37cac126b6
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 890e4bbf40aee318a2174bd4130cf34437e5617b)

BUG-8618: improve debug logs

We can have a reasonable ID prepended, add that. Also improve range
of threshold parameter, as we are addressing journal entries here.

Change-Id: I86aac1be04df8b72bfa6ffaa2b7a7e3b4cbfad6e
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 22e817f688bb73420ddcac1f20cf71379ff3a508)

Log data after in IdIntsDOMDataTreeLIstener

At TRACE level.

Change-Id: Ic71aeec4c121d5cfb53a09762c9845e3e94f4f04
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
(cherry picked from commit 8e1dd830bcc8f0b5d34192ebf9a8d45a165a90b1)

BUG-8618: refactor SyncStatusTracker state

Introducing a leader target encapsulation allows us to
enfore state transitions (i.e. state is guaranteed to be
non-null when we need its bits).

This enables us to eliminate the need for a magic constant.

Change-Id: Iab7178694edc3c62032e32c4386c371630f67b6f
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 2c42c1d35a45fb7eced5a3ffd07b52bdc26f7e40)

BUG-8618: make sure we refresh backend info

When we are performing a reconnection attempt we must never use
previous backend info, but rather have to refresh it.

Fix this by removing state when resolution fails.

Change-Id: I65592f2101547a606a15d9c8030c7d8c58afe8a5
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit a816f13940862759cefcd6e69330c6a70b512ee2)

BUG-8618: add threshold crossing debugs

We are observing messages about sync status changing on the order
of 10s a second (14ms between messages). This looks awfully like
inter-node latency, hence it needs to be tuneable.

We do not have an understading of what sort of jumps are we talking
about, so add logging to the source of this events at debug, so these
can be diagnosed.

Change-Id: I9e2d78629f8808914cdb664cb28afcd47a55ee80
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 97875ef2635a536c0663750503ea6e64486d2fe6)

Improve timeout message

Rather than reporing nanoseconds, convert them to fraction seconds.

Change-Id: I9052462990f8c6b99349ed123f682ce3f0e23461
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 715bf60ac1899a3c01690d244d26b12c9212ecc7)

Pull in pax-exam-features

BGPCEP downstream is using mdsal-it-base and it is failing to bring
up the container, as the features are not present in local repo.

Add the appropriate declaration, which fixes the breakage downstream.

Change-Id: Id3c21addb628ff37fb50e705773b9e4db83d17bc
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Migrate to odlparent 1.9.0

Change-Id: I7d4af74e7713d48dd6ad8431229c4963423abbf6
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Remove filter-valve

Change-Id: I9d6449a1e6337626373896e4766ea5a71ccee8ff
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8665: fix memory leak around RangeSets

This is a thinko on my part, where I was thinking in terms of a
discrete set (UnsignedLong) and assumed RangeSets will coalesce
individual items.

Unfortunately TreeRangeSet has no way of knowing that that the
domain it operates on is discrete and hence will not merge invididual
ranges.

This patch fixes the problem by using [N,N+1) ranges to address
the problem. A follow-up patch should address this in a more
efficient manner.

Change-Id: Iecc313e09ae0cdd51a42f7d39281f7634f0358a7
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG 8649: remove bounded mailbox from ShardManager and notification actors

Change-Id: I52975d969a81cc3ccdc7b963e0f43f9958ba6a10
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 8a298158f0164f581f6d47ae505c76a2cb7e3771)

Fix intermittent PreLeaderScenarioTest failure

java.lang.AssertionError: AppendEntries - # entries expected:<1> but was:<0>
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:743)
  at org.junit.Assert.assertEquals(Assert.java:118)
  at org.junit.Assert.assertEquals(Assert.java:555)
  at org.opendaylight.controller.cluster.raft.PreLeaderScenarioTest.testUnComittedEntryOnLeaderChange(PreLeaderScenarioTest.java:57)

AppendEntries appendEntries = expectFirstMatching(follower1CollectorActor,
                AppendEntries.class);
assertEquals("AppendEntries - # entries", 1, appendEntries.getEntries().size());

After the payload is sent to the leader, it expects an AppendEntries sent to follower1
with a single ReplicatedLogEntry. From the test output this did occur correctly but
the MessageCollectorActor still had the initial empty AppendEntries sent on leader
startup. The test setup waits for the initial AppendEntriesReply's from both followers
prior to clearing messages in each MessageCollectorActor however the AppendEntries may
not have been delivered to follower1's MessageCollectorActor yet and thus doesn't get
cleared. We need to specifically wait for the AppendEntries in follower1's
MessageCollectorActor.

Change-Id: I638a21e75ea135c1fe24970135f564da4fc5738e
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8606: Continue leadership transfer on pauseLeader timeout

Modified it to continue with leadership transfer if pauseLeader times out
instead of aborting. The shard may have a lot of transactions queued up
which it can't finish in time but there may still be a follower that is
caught up (ie whose matchIndex equals the leader's lastIndex) or would be
caught up if leadership transfer continued. Worst case is no follower is
available and the "catch up" phase of leadership transfer also times out
which would lengthen shut down time but that should be fine.

Change-Id: I1ec1ef43bb556e50416bb7239ce3c267265db9b3
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

yang-test POM simplification

Motivated by review https://git.opendaylight.org/gerrit/#/c/58714/, and
realizing that the odl-license version in this POM will cause issues
everytime we'll upgrade odplarent (which may be more frequent, soon).

This is effectively a counter proposal to c/58714, as on looking at this
it seems instead of bumping that version, we can remove most of what's
causing this (without loosing anything; CS still runs, not that we
care), becaues most of this is is inherited in Maven, the point of that
configuration is just to skip the VS validation which this project for
some reason exceptionally gen. into src/main instead of into target/.

The ${lifecycle.mapping.version} is replaced by fixed 1.0.0

Change-Id: Ic78c35a940dc7ee5fe25f3527529591e5c5214f7
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

BUG 8629: log inconsistent notifications as warn

Change-Id: I872fced1ecc913e521ddf7c0d4acee7f48b04cb1
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 2c510d40d2813e5f65f755e53e5ef570b2ac5fba)

BUG 8618: Log leader status when rejecting request

Change-Id: Iecd99a74473b68f43b7ad43a1272d679aa09b4e6
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 0849bf398f71419ac124a1dddacf6dbd40775426)

Do not flood logs with modifications

Debugging logs have grown quite a bit for tell-based protocol
mostly due to us dumping modifications as part of the request
message. Log only the number of modification in the message,
which will make the logs quite a bit more readable.

Change-Id: I35961702b7bdd0e3f93cd03f05a0e443a14bf419
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 11b30d7680da427f78188dd841c5d6509c12ef33)

Bug 8662 RefreshingSCPModuleInfoRegistry synchronized

Looks like (as seen on Bug 8662) there can be race conditions between
RefreshingSCPModuleInfoRegistry's updateService() and close() ... The
NPE in that bug on the "osgiReg" happens even if though that's guarded
inside a if (this.osgiReg != null), but it's during shutdown, so
presumably it JUST got set to null in close() ? A "synchronized" on
both methods should prevent that.

Change-Id: I607092e3174c2fd0447d8548f4933624acd6a29b
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Switch from config-parent to binding-parent in archetype resource poms

Change-Id: Ibbebaf67e14da2cc958db05a01ca44fdff074bc7
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in mdsal-trace

Change-Id: Ieaca84db2ff205b6717cd2d37443459b75bdc7fa
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in mdsal-it-parent

Change-Id: I4f37d6dfa1141efa0e34cb0b4f2863a7e5938efa
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in mdsal-it-base

Change-Id: I0d3393fae51408b857b896fe3ff9736f67d868aa
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in sal-remoterpc-connector

Change-Id: I42310b193ae3bd5726ad679771b38964d267cbf4
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to binding-parent in clustering-it-provider

Change-Id: Ic1a9d83f155ba02687ded8fd3b4ee13aa1ab2201
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Remove protocol-framework config yang

The protocol-framework config yang modules aren't used, ie aren't
instantiated in any config XML file, so remove them.

Change-Id: I8cc8c8c8666ef731a2e7da20a3046300ce5aad45
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Migrate sal-common-testutil odlparent-1.8.0-Carbon

Missed one in our migration work.

Change-Id: I6cfc1669277601887bacdb756ec8f996f32c1b92
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Remove unneeded cast

Once mdsal.dom.broker.ShardedDOMDataTree is fixed up, we can remove
this cast.

Change-Id: I042dbf3bf071b7af33876501e3274fc96724e58c
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Fix format string mismatch

String expects two objects, not one.

Change-Id: I5cc37336236e88c13d569c656910d7fd969bb655
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Remove config yang/module from sal-cluster-admin-impl

The module was a no-op but was kept for backwards compatibility for
the uber controller.currentconfig.xml but, since netconf was converted to
use the DS a couple releases ago, it should be ok to remove it.

Change-Id: Ib631658bcf5abc77231e8195112964208d3465a6
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Convert rpcbenchmark to blueprint

Change-Id: I1af664396721a899d51d12fe132853d642d98a7e
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Convert ntfbenchmark to blueprint

Change-Id: I6e4fe58575783d94cf9d01dbcd5b45d50d8b0f60
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Convert dsbenchmark to blueprint

Change-Id: I14516ea469ec23aff17bc3d565f859b9edb99d2d
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8494: do not attempt to reconnect ReconnectingClientConnection

If we are in reconnect state, we should never attempt to initiate
reconnection, as that would leave us without the timer running --
which is a problem since we need to be timing out requests which
are queued even as we are attempting to reconnect to the backend.

Change-Id: Ic955a2e5b743617c26cc72815df94d0c4584704c
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 584be7bf6b41f2f8b01dd718aba8d3b6cf7426ef)

BUG-8403: fix DONE state propagation

The DONE state detection logic in replayMessages() was flawed, as
we checked the current state, which is guaranteed to be SuccessorState.

We should be checking the previous state, available from the successor
state. As it turns out we can do this very cleanly by setting the flag
when the successor state gets the previous state assigned.

This also has better performance, as we do not touch the volatile
field multiple times.

Change-Id: Ica2246160bf8fee7aa134bbacb45857235405f6a
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b135d9ab189dfd7443f895aa96160e22666cb266)

Migrate to odlparent 1.8.0-Carbon

Per request of odlparent project we are downgrading all Nitrogen
projects to use the released odlparent 1.8.0-Carbon to allow for the
odlparent project to start performing semver style releases.

Jira: RELENG-159
RT: 41406
Change-Id: I69e9fab9531b1127286ca3f4467b9a046ce25a51
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Remove deprecated ShardDataTree#commit method

Change-Id: Id9eaa309f1270e555202e7afafa9ce69c63405da
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Remove deprecated NormalizedNodeInputStreamReader ctor

The intent is for NormalizedNodeInputStreamReader to be package-scoped
and to create instances via NormalizedNodeInputOutput#newDataInput.
Thus the deprecated public constructor was removed and the remaining
users were converted to use NormalizedNodeInputOutput#newDataInput.
However newDataInput has the side-effect of validating the input stream
first which failed for a couple users who need to lazily do the
validation so I added a newDataInputWithoutValidation method.

Change-Id: Ieb97ab77d05d7a4401dd0526cd4df3a5eafc9eda
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

sal-common-testutil TestEntityOwnershipService

moved from existing (and already used there)
org.opendaylight.genius.interfacemanager.test.infra, because I now need
to also use this in a new component test for netvirt fibmanager, so
instead of copy/pasting it, let's have it here.

Change-Id: I4a524594f9d4bf0da6015738ef3b454eba13159d
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Fix RecoveryIntegrationSingleNodeTest failure

The InMemoryJournal may not have received all the persisted messages
by the time it checks the expected size of the journal so added a latch/wait
for he expected messages..

Change-Id: I8f050b9375f5e3e74749c17e831add21d09d1831
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 5740: Configure control-aware mailbox

Configured unit tests and production to use a control-aware mailbox
for Shard actors. The current code allows for a "shard-dispatcher"
to be defined so I added a section in the .conf that specifies the
mailbox-type appropriately (ie UnboundedDequeBasedControlAwareMailbox).

Change-Id: Ibdb404e1dfcc699471a8e899c491a09500ee04c0
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8494: fix throttling during reconnect

ReconnectForwarder is called from differing code-paths: the one is
during replay when we are dealing with late requests (those which have
been waiting while we replaying), the other is subsequent user requests.

The first one should not be waiting on the queue, as the requests have
already entered it, hence have payed the cost of entry. The latter needs
to pay for entering the queue, as otherwise we do not exert backpressure.

This patch differentiates the two code paths, so they behave as they
should. Also add more debug information in timer paths.

Change-Id: I609be2332b13868ef1b9511399e2827d7f3d5b7d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 851fb56fba015c9fee3f0f9235c5c631a492ce59)

BUG-8403: propagate DONE state to successor

We need correct accounting for DONE non-standalone local transactions,
as such transactions do not interact with open/closed semantics.

Propagate DONE via a simple flag, which we check in local ProxyHistory
a create a proxy without a backing modification.

Change-Id: Ie921db8c9e40f30934c119b74c31ca5418b61548
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 59ffaa4e95ffc8f8f04d1ca8d4f45f2ac1ef23b6)

BUG 8318: Add section for remoting transport-failure-detector

Similar to separate dispatcher for cluster we might also
trip a false positive in remoting so add this in so we can modify
the parameter in csit.

Change-Id: I751fec044e2bf0f0d82badb2ea7d581b3374ac4a
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 7ea291d0c0d183755795e33881fd040c368f57a7)

BUG-8403: go through the DONE transition

Third step of the fix: make make AbstractProxyTransaction go through
the DONE state before retiring. This ends up also fixing breakage
in local chain transactions, which could end up leaking because we
never go back to just using the base data tree.

Change-Id: I97ac1687eaf3ecd8f46a68c6170891ea06703e95
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 9b4c07ca27b2c3c7ca5d5e790aa1f121ce4857f3)

BUG-8403: add state documentation and DONE state

Second step of the fix: clarify AbstractProxyTransaction states and
their transitions. Introduce a DONE state which we will use to close
the replay state race window.

Change-Id: I82e47103f2cd9b8ec496b72803b5d5e56d33c0f5
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 720292646a403e3fa4f33d12dcecbd059486b871)

BUG-8403: move successor allocation to AbstractProxyTransaction

We still have a tiny race window where we do not correctly handle
reconnection, leading to an ISE splat -- this time if the final
stage of transaction completion.

The problem is that a reconnect attempt is happening while we are
waiting for the transaction purge to complete. At this point the
transaction has been completely resolved and the only remaining
task is to inform the backend that we have received all of the
state and hence it can throw it out (we will remove our state once
purge completes).

We are still allocating a live transaction in the local history
and the purge request replay does not logically close it, leading
to the splat.

To fix this, we really need to allocate a non-open tx successor,
which will not trip the successor local history. All the required
knowledge already resides in AbstractProxyHistory, except we do
not have a dedicated 'purging' state.

This makes the first step of moving the allocation caller to the
appropriate place.

Change-Id: If82957019b478f4d5132edda4d38e6bc026aa0ab
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit cc5009b8f3ea91f64ee48cda815c6a5e73a8a1af)

BUG-8494: Cap queue sleep time

Inconsistency in transmit queue backpressure has caused an observed
delay of over 44 minutes in CSIT. Thile this is an artefact of a bug,
we should not delay application thread for large periods of time, as
that can cause failures in the application (if it has other
time-bound duties like BGP keepalive messages and similar).

Restrict the sleep time to 5 seconds, emitting an INFO-level messages
when we reach this level of sleeping. This should not typically not
occur, as the backpressure should be kicking in much sooner, smooting
out the delay curve.

Change-Id: Ie5f148248caa71791bdda71ddd7e33e5733aa7f8
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 15a67bd103c2dc32f28139a7295ac84143c20d0c)

Bug 8446 - Increase timeout in leadership transfer

Change-Id: Iffd66ef2c771b797b236f23c39b1fb87b5a27c89
Signed-off-by: Jakub Morvay <jmorvay@cisco.com>
(cherry picked from commit ea6ba66600c4f2f143cbb13279ded184ce6d3fcc)

Cleanup time access

ShardDataTree does not need to expose the ticker, just a readTime()
method. This makes the users slightly more readable.

Change-Id: I9aa72a2d3625f40a2a44b0838ff344437293e1e3
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit a0f85a19ba36f71288c7b45575befd98d7d77ed4)

BUG-8515: make sure we retry connection on NotLeaderException

There is a race window when we are establishing connection to the
backend:

When we received the pointer to shard leader, we send a connect
request, but during that time window the leader may move, resulting
in a NotLeaderException response to ConnectClientRequest. Since
we are in reconnection mode, this will result in hard abort of
connection.

Fix this by wrapping NotLeaderException and akka failures in a
TimeoutException -- hence we will retry connecting.

Change-Id: Ia5d1915d59e80a70c54302c1790121d0767ff08a
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 51a85b6c8fce1d9808285a6ad81dc7068afbf7c7)

BUG-8403: do not throttle purge requests

It seems we are getting stuck after replay on purge requests,
which are dispatched internally.

Make sure we do not use sendRequest() in obvious replay places,
nor for purge requests. Also add a debug upcall if we happen to
sleep for more than 100msec.

Change-Id: Iec667f2039610f3f036e6b88c7c7e7b773cdfc19
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 20ece8c549211d1c453f1763132bb0a0ca7be0e0)

BUG-8538: rework transaction abort paths

Direct transaction abort path can end up touching proxy history's
maps, which it should not, as that happens only after purge. This
inconsistency has cropped up when purge was introduced.

Refactor the methods so that cohorts are removed only after purge,
and fix abort request routing such that it always enqueues a purge
request (possibly via successor). This also addresses a FIXME, as
we now have an enqueueAbort() request, which is not waiting on the
queue.

Change-Id: Ie291da70ace772274f33505db376a915b38e37c0
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 8fca604f2312ef365ce05343c2378cf36f2e31af)

BUG-8538: do not invoke read callbacks during replay.

As evidenced by a ConcurrentModificationException happening reliably
in face of aborted read-only transactions, there are avenues how
our state can be modified eventhough we hold the locks.

One such avenue is listeners hanging on read operations, which
can enqueue further requests in the context of calling thread. That
thread must not be performing replay, hence delay request completion
into a separate actor message by using executeInActor().

Change-Id: Ibcd0ac788156011ec3a4cc573dc7fb249ebf93a2
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 8123d0fc56a28324fed48e3027edf090e8149b9b)

BUG-8371: Respond to CreateLocalHistoryRequest after replication

CreateLocalHistoryRequest needs to be replicated to followers before
we respond to the frontend, as logically this request has to be
persisted before any subsequent transactions.

While the frontend could replay the request on reconnect, it would
also have to track the implied persistence (via child transactions),
which we do not want because it really is a backend detail and it
would lead to a lot of complexity in the frontend.

Change-Id: Icdfad59d3c2bab3d4125186c6a9b3c901d3934f6
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 8f18717f60e58eebf726fe0611859311fa83df44)

BUG-8540: suppress ConnectingClientConnection backend timeout

While a ClientConnection is in initial connect state we do not want
the timer to attempt to reconnect it, as it we are already trying
hard to connect it. Suppress that attempt by faking backend silent
ticks to be 0.

Change-Id: Iaf554632a56fd5be1d417d6806462edf3c746526
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit a3b2c1a05f66523561a10ac898723ffdf7e68798)

BUG-8452: make NoShardLeaderException retriable

We can recover from this exception by retrying the connection to
the backend. Wrap it in a TimeoutException, which will cause a new
connection attempt.

Change-Id: I1d5c771fdb89cbdd7723e0425542154a1ed85853
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit c74608b67d88d809ebec51c0e84add37a0b98711)

Bug 8568: Removed deprecated HydrogenNotificationBrokerImpl

Change-Id: I707a787b3e705fb9959056e50f06b2207933bcd3
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8568: Remove deprecated MountProviderService APIs

Removed the MountProviderService and associated APIs and implementations.

Change-Id: I0cfde4f9d6204c4bcee1b1fc7028402b9290c6e4
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8568: Remove deprecated MountProviderService from RootBindingAwareBroker

The MountProviderService API has been deprecated since Helium so it should be
safe to remove it from the RootBindingAwareBroker.

Change-Id: I7dc7b05feaafb08004f104da8495adc87e5078b1
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8568: Remove DataProviderService/DataBrokerService APIs

Change-Id: If9b8bc26c3f4d1c5eea09c1c5ad993732fbc5f6c
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8568: Remove deprecated HydrogenDataBrokerAdapter

Removed the deprecated DataProviderService implementation
class HydrogenDataBrokerAdapter and the corresponding
config yang and ForwardedCompatibleDataBrokerImplModule.

Change-Id: Ie18e6e1ae6a9e68b97e39b278618a4a0c1c9219d
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8568: Remove DataProviderService from RootBindingAwareBroker

Change-Id: Ib5e4f70ef72819103544cd6388558dc4a05b55d2
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Replace logger and log by LOG

Replace logger and log by LOG to follow the
OpenDaylight recommendations [1].

[1]
https://wiki.opendaylight.org/view/BestPractices/Logging_Best_Practices

Change-Id: I63787ccee5950bebbc8c3769885574593a666809
Signed-off-by: David Suarez <david.suarez.fuentes@ericsson.com>

Bug 8568: Convert sal-binding-dom-it tests to use DataBroker

Converted the tests in sal-binding-dom-it to use the DataBroker API
instead of the deprecated DataProviderService.

Change-Id: I6d4a3442d3c5cf5ddf34806b6a71454c48e3b54a
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Remove deprecated Snapshot and related code

Carbon will create a new snapshot when it encounters a pre-Carbon
Snapshot so we can remove the pre-Carbon Snapshot and related code.

Change-Id: Iae5f140aadb458eaa59ea4cc8be6054bbde090e4
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Fix checkstyle problems not detected by the current version

This change is required for overall move to new Checkstyle version, see
https://git.opendaylight.org/gerrit/#/q/topic:bumpCheckstyle

Change-Id: I9755c1964a7ffa4f6b7d188b5b746e2c9246ad45
Signed-off-by: David Suarez <david.suarez.fuentes@ericsson.com>

Remove deprecated PreBoronShardDataTreeSnapshot

Since Carbon will migrate all pre-Carbon snapshots, we can remove
support for pre-Boron snaphot compatibility.

Change-Id: I74ee98f013e15c5abf24412671e4ac20bcdda66e
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Remove deprecated ShardManagerSnapshot

The original ShardManagerSnapshot was deprecated in Boron and thus should
be safe to remove now.

Change-Id: I643dcf6e06ad4842b69bf1ab1992b028786c83f8
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Replace LOGGER by LOG

Replace LOGGER by LOG to follow the
OpenDaylight recommendations [1].

[1]
https://wiki.opendaylight.org/view/BestPractices/Logging_Best_Practices

Change-Id: I024bcd5f23a5bdcc177440b175578694c6c471a4
Signed-off-by: David Suarez <david.suarez.fuentes@ericsson.com>

BUG-8402: correctly propagate read-only bit

During replay we substitute read requests with an IncrementSequence
request, but that does not indicate whether the transaction state
should be read-only.

This leads to transaction chains allocating a full-blown transaction
instead of a snapshot, hence follow-up transactions fail to allocate,
leading to OutOfOrderRequestException.

Fix this by making IncrementTransactionSequenceRequest a subclass
of AbstractReadTransactionRequest so it carries isSnapshotOnly().

Change-Id: Ifdb6214478aa7548d3bc1f06b532e06c93b3dd0b
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b24517538beb4f44e6a9a96e68e4bf48156b480f)

Bug 5740: Add Deque-based control-aware mailbox

Since akka persistence uses stashing, it requires a mailbox to be
Deque-based to provide the enqueueFirst method. However, the
control-aware mailboxes provided by akka are not Deque-based so we
need one that is.

Change-Id: I74f214c725eff16aba093aad3f2f6eed80948ee4
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 5740: Add ControlMessage interface to raft messages

Added 'implements ControlMessage' for all RaftRPCs and other messages
related to raft that should have higher priority.

Change-Id: Ie699531ef67d9cbcf7cbdec0422dd2e6faafebaa
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8371: raise unknown history log to warn

This error seems to be happening quite often, raise it to a warning
so we understand what request is triggering it.

Change-Id: If357325787f5c859a46af9286c86c0e9934909cb
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit f336a5c159ed94fb63d588b934727d8149248273)

BUG-8403: raise misordered request log message

This error seems to occur intermittently, raise the message to
a warning.

Change-Id: Ia749a9ac17fa75ef26fe7a2963fa9ea3a0b35731
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 956797bba2da2db6de3f1e8877d6825aa4c1f159)

BUG 8403 Timeout writetransactions on initial ensure

This stage can get stuck aswell and if the submit is never timed out
from the backend as a result of a bug it will never complete.

Change-Id: Ia424d009cd201e3f03a13af88c35b1390b40cbee
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit bf9e4dc047d4899f76cd95a1f1f4106f3c5bb4a3)

BUG 8402: Close readonly tx

This transaction is only used for an exist check of
default prefix shard configuration and needs to be closed
once we are done with it.

Change-Id: I8d7c06e7e3ce58cb91713dac14744c411ec1bf5f
Signed-off-by: Tomas Cere <tcere@cisco.com>

BUG 8525 Listeners not getting triggered from followers

This is an oversight in the dtcl implementation of the lowlevel
model. However we also need to change the proxy listener thats
registered from the new sharding apis as there is no way
for the user to specify this cluster interface since the mdsal
api's are required.

Change-Id: I41c02a45d1db9eb9ed8c6e63dff99da567829d2f
Signed-off-by: Tomas Cere <tcere@cisco.com>

Bug 5740: Remove Serializable where not necessary

Some raft message classes are Serializable but they don't need to
be as they're only sent locally.

Change-Id: Ibd052b9a4589dd2476b30c51e301b3dd609df750
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 5740: Change TimeoutNow and Shutdown to externalizable proxy

Change-Id: I3b2289c258ffab288901b5cbf4e5032bc143dfc7
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 5740: Change RequestVote(Reply) to externalizable proxy

The other RaftRPC classes havwe been converted to use the
externalizable proxy pattern so we shoild convert RequestVote(Reply)
as well.

Change-Id: I0a2054d8426f66480f37061d1a9fc51464f705da
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Fix test warnings

This fixes most raw type warnings.

Change-Id: Iaec02aa9f40df6d04b9f1bfa7045c84b6cc40a53
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG 8525: Prevent NPE in test-app listeners

Prevents the NPE thrown when the listeners didn't
receive any notifications.

Change-Id: I0d774913a15b4341abce779c64d6ee8f75d6a0e1
Signed-off-by: Tomas Cere <tcere@cisco.com>

Migrate to yang-data-codec-xml

The codecs in yang-data-impl are going away, use their new place.
Also optimize serialization by sharing XMLOutputFactory, as it is
thread-safe after having been configured.

Change-Id: If6d3b348fa8568e4e84199d4c23ec910c9fc6343
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Simplify code with new Map features

This is mostly computeIfAbsent to replace the “get, if null,
initialise and put” pattern.

Change-Id: Ie866025c3a6d5099f6f5494ab9d4437d7e9d2320
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Deprecate legacy EOS API classes

Deprecate the legacy API in favor of the new APIs in mdsal.

Change-Id: I819fe7e7006694a5912e4c324055df10d4a33d3d
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8507: Fix replayed directCommit() on reconnect

After remote shard reconnect of a brief isolation, we have observed
a NPE indicating that we encounter a NPE when faced with a direct
commit.

Assuming state engine correctness, this can happen during the time
when we have completed preCommit and before we have recorded the
request result (i.e. after commit completes).

At any rate, this flushes out the need for transaction transitions
to be idempotent, which is something ShardDataTreeTransaction and
ShardDataTreeCohort do not provide.

Encapsulate FrontendReadWriteTransaction state into distinct state
objects. This allows us to accurately track the internal transaction
state and detect when a canCommit, directCommit, preCommit and
doCommit are no-ops because the request is being already handled.

Change-Id: Ib533ec9a4882f51f7914c5b11865ac093c6d6ad0
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 59fab9a9bc6dbf9ad538b3df4460eff146c63ce2)

BUG-8511: add more explicit messages

This adds more defensive handling of connections and locking,
even if it should not strictly be necessary, as we are using
atomic operations and run on the actor thread. This makes the
transitions work even in fact of actor context leakage.

Change-Id: I26df0f208d63b861a0f3d3dc3c0f1959bbc79e90
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 90a377ec4fe7c1aa4df6d4fde74bfc8189e95b08)

BUG-8403: guard against ConcurrentModificationException

Using TransmitQueue.asIterable() offers slight advantage of not
dealing with a big list, but exposes us to the risk of the Iterable
being changed.

The point missed by the fix to BUG 8491 is that there is an avenue
for the old connection to be touched during replay, as we are
completing entries, for example reads when we are switching from
remote to local connection. In this case the callback will be invoked
in the actor thread, with all the locks being reentrant and held,
hence it can break through to the old connection's queue.

If that happens we will see a ConcurrentModificationException and
enter a buggy territory, where the client fails to work properly.

Document this caveat and turn asIterable() into drain(), which
removes all the entries in the queue, allowing new entries to be
enqueued. The late-comer entries are accounted for when we set the
forwarder.

Change-Id: Idf29c1e565e12aaed917ac94c21c552daf169d4d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 930747a6ba5d888d2fbe54473132680e4621d858)

BUG-8491: Remove requests as they are replayed

We should not be seeing any messages just after we have finished
message replay, as the queue is still locked and we should have
accounted for all messages by removing them from the queue.

Change-Id: I47396b4705e048460934538acc470468a0a6285d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 585e116247f9b616579ffad1785a972621d928e7)

BUG 8462: Switch to using cds-client in usubscribe-ddtl

The initial notification seemed iffy when the leader was moving,
so switch the final data consitency check to cds-clients read
which also makes this more consistent with unsubscribe-dtcl.

Change-Id: Ia23da11a5bda33925ee6ba911d2794f666a17a94
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 5daa19c25730a83fa5d0eb510b47ff159fe734fb)

Do not retain initial SchemaContext

While looking over a memory dump I have noticed that we retain
SchemaContext inside Shard$Builder, which is being retained via
Props (which are used to restart the actor).

This reference is not updated as the SchemaContext is updated, which
means we are wasting memory and are causing Shard to come up with
an ancient SchemaContext after a failure.

Fix this by having an AtomicReference holder for SchemaContext
and have Shard have a Supplier<SchemaContext>.

Change-Id: I73fcae46f249d3679522eb7dbbb059e43c5af6c7
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

yang-jmx-generator: use lambdas

This series of patches uses lambdas instead of anonymous classes for
functional interfaces when possible. Lambdas are replaced with method
references when appropriate.

Change-Id: Ica1ba9d37edbd25421b8fabac5cb1567608e1fb6
Signed-off-by: Stephen Kitt <skitt@redhat.com>

yang-jmx-generator-plugin: use lambdas

This series of patches uses lambdas instead of anonymous classes for
functional interfaces when possible. Lambdas are replaced with method
references when appropriate.

Change-Id: Ic44563e54557fb678c23c7bd79121419303ef153
Signed-off-by: Stephen Kitt <skitt@redhat.com>

BUG-8402: fix sequencing with read/exists requests

When replaying successful requests, we do not issue read and exists
requests, as they have already been satisfied, but account for their
sequence numbers.

This does not work in the case where we have a remote connection,
the first request on a transaction is a read and after it is
satisfied subsequent requests are replayed to a different backend
leader.

Since the initial request is not replayed, but subsequent requests
account for it and the backend has no prior knowledge of the
transaction, it sees an initial request with sequence != 0, and
rejects all requests with an OutOfOrderRequestException.

Fix this by introducing IncrementTransactionSequenceRequest, which
the frontend enqueues as the first request instead of the initial
read/exist request -- introducing the transaction to backend.

Change-Id: Ia0f048e33d417e1fdc8d15bf319d6b8b33c2b1b1
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 7f15e81c52f2efda779c670580f0697227557404)

BUG-8402: Separate out OutOfOrderRequestException

OutOfOrderRequestException is used for two distinct cases, which is
a mixup during refactor.

The first case is when an envelope's sequence does not match the
sequence we are expecting on a connection. This is a retriable
exception and happens due to mailbox queueing during leadership
changes:
- a FE sees us as a leader, sends requests
- we become a follower, we reject a few requests
- we become a leader, at which point we must not process requests
until the FE reconnects, as we would not be processing them in
the correct order.

The second case is when we receive a Request with an unexpected
sequence. This is a hard error, as it indicates that the client
has made a mistake and lost a request (like the case fixed in
fe69101801085580f2fe72762abea5c5fa83d978).

Separate these two cases out by introducing
OutOfSequenceEnvelopeException and handle it by initiating a session
reconnect.

Change-Id: Ifb0bac41ff2efd6385455fd9c77b8b39054dd4a0
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit d02d60083ee163cf465c265364c21c0df9cdc3c7)

BUG-8402: Record modification failures

When a modification fails to apply, we must record the resulting
failure, as we have partially applied the state and hence should
never attempt to try to do it again even if the client retransmits
the request.

Furthermore we should stop responding to any subsequent requests
including reads, as our responses are not accurate anyway (and the
requests may have been enqueued before the client saw the failure).

Enqueue the failure and respond to all subsequent requests with it,
forcing the transaction to fail the canCommit() phase.

Change-Id: I1d25f1b3a688e02f8a69f54f22a5d6d2dd43339c
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 63e6ab3f36e57954baf391855541cf3d42d38a0f)

BUG-8422: separate retry and request timeouts

This patch corrects a thinko around request timeouts, where we
reconnect the connection based on request timeout, not based on
the 'try' timeout.

The difference between the two is that the 'try' timeout is the
period we allow the backend to respond to our request and when
it does not, we reconnect the connection.

Change-Id: I8c00a80e5c26c5b829056c43fe78a0567041bc5e
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit f32b44f6e2dac23938a2c01638872c65ba1237f5)

Fix logging format/argument mismatch

Two debug sites fail to pass down shardName, leading to mal-formatted
log messages.

Change-Id: I5521539c54c2e1f7ef5ef25d9a47fbc6d6d0a27c
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 2fdecf33df3d9ca653fb8730116c54c67c6740ed)