git.opendaylight Code Review - controller.git/log

Bug 9060: Minor [Java|inline] doc update re. getStackTrace() performance

Change-Id: I4f1ab259b79154f7abc8df2d28d5ecb2ad18ef98
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9060: Minor update to inline documentation for new flag

Change-Id: Id2eb1e76a8658e837166b227a30cd49ee665c258
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

BUG-9054: do not use BatchedModifications needlessly

Transaction identifier, which is a required parameter for
BatchedModifications is a resource tracked on the backend and is
assumed to be allocated contiguously. Using BatchedModifications
to transport only a list of modifications means we are allocating
transactions IDs which we then never use.

This patch reworks the logic so it tracks modifications in a list
and allocates BatchedModifications only when we are ready to actually
commit something.

Change-Id: I3f71511cfd68e96e80790e69d28d083f195e5e12
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 71a4b6377ba598b18c64b89b6b16538751d2d116)

Bug 9008: Fix the error of the persisted journal data format

We have to clear the lastLeafSetQName while processing the end event for node
in NormalizedNodeInputStreamReader and AbstractNormalizedNodeDataOutput.

Otherwise while processing the leaf-list node, the leaf-list entry node
may use the other LeafSetQName as it's node identifier incorrectly.
The DataTree reconstructed from the persisted journal after the controller
restart will be not equal to the DataTree before restart under certain
circumstances.

Change-Id: I4ee823f59fe477d08f982ae73e3850433dfea8ee
Signed-off-by: HeYunBo <he.yunbo@zte.com.cn>

Minor: mdsal-trace-api does not need sal-broker-impl, just sal-core-api

This is not directly related to / strictly required by Bug 9060, but
I've found while hacking on and testing that, and thought it would be
good to clean up.

Change-Id: I749e63025060ee2d51fe04d0ab4eb932c13f6c25
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9060: Karaf CLI command to print open transactions

including some minor changes to make output more pretty / readable.

This is, for now, the last in a serious of commits which is part of a
solution I'm proposing in order to be able to detect OOM issues such as
Bug 9034, based on using the mdsal-trace DataBroker.

Change-Id: I83af00a0713be4e8fab3085942b7b57d7183a20c
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9060: TracingBroker printOpenTransactions

This method is intended to be used from a Karaf CLI command in the next
change (and maybe JMX or something else like that later), which can be
invoked during future automated testing to detect Tx leaks during CSIT.

This is one of a serious of commits which is part of a solution I'm
proposing in order to be able to detect OOM issues such as Bug 9034,
based on using the mdsal-trace DataBroker.

Change-Id: I682700bef9644834e8b4ca36b21729f021a76bf0
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9060: Remove un-used Instant getObjectCreated() from CloseTracked

I initially thought that it would be "interesting" to be able to do some
sort of output sorted by the age of the object creating kind of UX in
the CLI I'm planning to propose next, but ultimately realized that
keeping an extra Instant fields in EACH CloseTracked (e.g. Tx) is just
overhead and not really adding much value (because the NUMBER of
non-closed objects is MUCH more interesting than this timestamp..), thus
removing this again after all.

This is one of a serious of commits which is part of a solution I'm
proposing in order to be able to detect OOM issues such as Bug 9034,
based on using the mdsal-trace DataBroker.

Change-Id: Ie40fe23ce2af670902ff8e44a6757ebdf9ef915e
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9056 - Class FileModuleShardConfigProvider does not load
configurations from classpath properly

Change-Id: If1be5fa92eb98a91266353ece8ccc9cadf27c22e
Signed-off-by: Jakub Toth <jakub.toth@pantheon.tech>

Remove sal-binding-config from toaster poms

sal-binding-config is a CSS remnant and is not needed.

Change-Id: I51be2818d9b247e7c6494c1ff88d3804cb0fc3d5
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Fix intermittent testOwnerChangesOnPeerAvailabilityChanges failure

EntityOwnershipShardTest.testOwnerChangesOnPeerAvailabilityChanges:647->AbstractEntityOwnershipTest.verifyRaftState:280->lambda$testOwnerChangesOnPeerAvailabilityChanges$2:648 getRaftState expected:<[]Leader> but was:<[Pre]Leader>

It seems this was indirectly introduced by the addition of the
PurgeTransactionPayload - changes the timing of things a bit. I added
code to ensure peer2's lastAppliedIndex is up-to-date with the leader's
prior to stopping the leader to make it deterministic (ie peer2 should
be able to go straight to Leader).

Change-Id: I9abb950c7dc67b2d481d07b9b421ae46421b6510
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 9060: mdsal-trace tooling with getAllUnique() to find Tx leaks

This is one of a serious of commits which is part of a solution I'm
proposing in order to be able to detect OOM issues such as Bug 9034,
based on using the mdsal-trace DataBroker.

Change-Id: I9cf4d8d9965468d77a0d82455655b9445535f0b0
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9060: TracingBroker with transaction-debug-context-enabled

This is one of a serious of commits which is part of a solution I'm
proposing in order to be able to detect OOM issues such as Bug 9034,
based on using the mdsal-trace DataBroker.

Change-Id: If62b7f76ea03d8cabe0c5a2088983275cfe50e44
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9060: Fix odl-mdsal-trace's missing mdsaltrace_config.xml

This seems to have gotten lost in the Karaf 4 migration.

see
https://wiki.opendaylight.org/view/Karaf_4_migration#.3Cconfigfile.3E

Change-Id: Id7c20c1daaaeb0844ef2278fe4931b24e7ef5b5d
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Fix intermitent testFollowerResyncWith*LeaderRestart failure

NonVotingFollowerIntegrationTest#testFollowerResyncWithOneMoreLeaderLogEntryAfterNonPersistentLeaderRestart fails intermittently:

NonVotingFollowerIntegrationTest.testFollowerResyncWithOneMoreLeaderLogEntryAfterNonPersistentLeaderRestart:233 Did not receive message of type class org.opendaylight.controller.cluster.raft.base.messages.SnapshotComplete

This seems to be a side-effect of https://git.opendaylight.org/gerrit/#/c/62255/
which changes the timing a bit such that an install snapshot doesn't occur on the
follower which should happen in order to completely re-sycnc it with the leader -
instead it ends up removing the stale out-of-sync entries and appending the new ones
from the leader which gets the journal up-to-date but the stale entries had already
been applied to the state which leaves the state out-of-sync with journal. I added
an additional check in the follower to force the leader to install a snapshot
if the first out-of-sync log entry index <= the lastAppliedIndex which means the
entries to be removed have already been applied to the state.

Change-Id: Ic3815a694a8531d9f7f42f19ad8978d52fc902b3
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8994 - FileModuleShardConfigProvider should not use hard-coded paths

* move setting paths to blueprint
* fix tests

Change-Id: If1e79b3d33d969167327819a1d13da00ee4bc882
Signed-off-by: Jakub Toth <jakub.toth@pantheon.tech>

Bug 9034: TracingBroker with TracingReadOnlyTransaction

The new TracingReadOnlyTransaction wrapper doesn't do anything
interesting yet - but it will, in the related upcoming next change.

This is one of a serious of (small, easy to review) commits which is
part of a solution I'm proposing in order to be able to detect OOM
issues such as Bug 9034, based on using the mdsal-trace DataBroker.

Change-Id: Ifa82c50d9c9eac76af99bf6a58e5e1955ee7429c
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 9034: TracingBroker with TracingTransactionChain

This is one of a serious of (small, easy to review) commits which is
part of a solution I'm proposing in order to be able to detect OOM
issues such as Bug 9034, based on using the mdsal-trace DataBroker.

Change-Id: I098c48a1fce1da2fdd0aafdc82fd3bef5626988a
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

BUG-2643: enable checkstyle/findbugs in archetype

New projects should have these set from the get go.

Change-Id: Ic85cdd003ecb1fe49326d3d6a1b1099d09522ebb
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Toaster is shardless

It's not like we broke it into shards. Nothing like that, our toaster
is fully working. Nevertheless it is a sample and has no place
in production code nor its configuration.

Change-Id: Ie14c698c1ea45a5fe201d1b6227eeb4f2d9790a5
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Fix deprecation warning in PingPongTransactionChain

Add MoreExecutors.directExecutor(). Also fix up formatting.

Change-Id: I4dcc849c643713b738f2d99b1250848e46fbe82a
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG-9028: make NonPersistentDataProvider schedule invocation

We need to make NonPersistentDataProvider behave in a fashion
similar to what PersistentDataProvider does for asynchronous
persistence calls, which is schedule execution of the provided
procedure rather than direct execution (which is fair for synchronous
execution).

In order to make that work we introduce ExecuteInSelfActor, which
has an executeInSelf() method, which uses internal mechanics to
schedule the call at a later point.

Change-Id: I116708d98154c8244ea80b4a1a1aa615abc3075d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b66d6180f06097e3501a88aac9fb684336addd58)

Add debug to pinpoint lastApplied movement

This method is called from multiple call sites, only one of which
is actually logging the change. Make sure we catch all transitions
by adding a LOG.debug() into the setter.

Change-Id: Ie777f8047a0893f9450fb132faa8adea235fbc5f
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Remove remnants of proto generation

This README and shell script are no longer relevant, as we do not
have any .proto files.

Change-Id: If3f6204c5d9e607b66a5eed3ecea56e5b5294023
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Fix bluprint logging

The log call is missing arguments, leading to a useless message,
fix that.

Change-Id: Ia6ae2d760724d2809ce30798aaf1b32465205824
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Make testTransactionForwardedToLeaderAfterRetry purge-aware

At the point where we are waiting for transaction replication
to fully propagate, we need to account for the purge request,
as otherwise the configuration could interfere with index
sequencing.

Change-Id: I13f93e306e5b77304916e4c05f39dc28fb9cc049
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit d1de9c55e280fc6a972b0cd408189057446c45a0)

Make ShardTest.testCommitWhenTransactionHasModifications() wait a bit

Committed transactions involve also a purge payload, which is persisted
asynchronously, hence it may or may not be visible in the journal just
after the transaction is reported as committed. Wait for two heartbeat
intervals before looking at the stats.

Change-Id: Ibe699edced12d006bf5ea8cd99aa821ab56d115d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bump karaf-empty dependency version to 2.0.4

Change-Id: I15329e8d207fdeaceee4b75a731391f2924bae03
Signed-off-by: Vratko Polak <vrpolak@cisco.com>

Bump versions by x.(y+1).z for next dev cycle

Change-Id: I007759fbe7e12c4b58189462fa7c676adf7f972f
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Add MXBean to report shard registered DTCL/DCL info

It's useful to see what listeners (DTCL/DCL) are registered for each shard.
Added a new message, GetInfo, sent to the DTCL/DCL actors that returns
DataTreeListenerInfo, including the stringified user listener instance
and the stringified YangInstanceIdentifier path. A new
ShardDataTreeListenerInfoMXBean is instantiated for each shard which
reports the DTCL and DCL info.

Change-Id: I312bc5d03fe836bc208ea442ebc2af0ef103120f
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8941: enqueue purges once ask-based transactions resolve

Backend state tracking relies on the transaction log to propagate
transaction state from the leader to followers. This includes purging
of transactions, i.e. the information that the frontend will not need
the state (and the final resolution of the transaction).

Tell-based protocol handles this on the frontend, ask-based needs to
do this on the backend (as it has no notion of transaction continuation).

Change-Id: I49e787b38998ef67b4a9ef504a70822263e1a340
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Eliminate protocol-framework

This piece of code has been moved to netconf, eliminate it from
controller.

Change-Id: I1a04ed800d88ab49ef6e1d0782ca722f18e16581
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bump odlparent 2.0.2 to 2.0.4

Change-Id: Iea7270b110536c10878d130db33409ed08dde987
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Fixup static method warnings

- a method can be made static
- invocation of static methods should not go through an instance

Change-Id: I9380a17432340c75fd94bd01c9dc5bb5cdbd8156
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 8885: Fix DistributedShardedDOMDataTree initialization

DistributedShardedDOMDataTree initialization expects the prefix
configuration shard to be present and ready with leader however
the latter isn't the case when the static module-shards is
bootstrapped without the local member so it can be dynamically
joined into an existing cluster. So I modified the ConfigShardLookupTask
to elide the ConfigShardReadinessTask.

Once past that, creation of the prefix-based default shard is attempted
as there isn't a local module-based shard however this fails b/c the
local prefix configuration shard is not connected to a leader. To alleviate
this I just commented out the code to create the shard. Since the default
shard configuration is present in the out-of-box modules.conf and is
expected to be present, we can assume at this point that the local member
isn't in the replica list with the intention of dynamically joining it to
an existing cluster, at which time the shard will be created.

These changes at least fix the regression with the boostrapping scenario.
We can revisit this iniialization later w.r.t. prefix-based shards.

Change-Id: I1faf531f4c79914d45203ee132dd4e65ad2f18ba
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Deprecate org.opendaylight.controller.md.sal.dom.spi.AbstractRegistrationTree

This is a utility class, which has a conterpart in mdsal. Deprecate it
and related classes, migrating users.

Change-Id: I8206350ddb60bb19aed93ff3840e0e68e288d55a
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Deprecated controller fine-grained sharding APIs

These are not maintained nor were they ever implemented. Deprecate
them and point to their mdsal counterparts to reduce confusion.

Change-Id: Idd1908c65b0737df0a3731e6e81a7d1c71f272d0
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Remove logback related stuff

as discussed on https://lists.opendaylight.org/pipermail/odlparent-dev/2017-July/001262.html '

Change-Id: I09146cd363d1ab706143bc12c8b1e37aa96c8723
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Optimize use of YangInstanceIdentifier.getPathArguments()

This method returns a list, hence we can lookup the first item
without iterating and also can use Lists.transform().

Change-Id: Ie26bfcc225c74154d65ef963e3444ac5ec10bafb
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG-8898: prioritize InternalCommand

InternalCommand requests should be processed as soon as possible,
and since we are already using ControlAwareMailbox, this is as simple
as marking InternalCommand as a ControlMessage.

Change-Id: Ic6025f4254da47801676c0c474d03e18abbf8f50
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 2ac32ea2c4f57993a1dc49ef8ce380cb03acc822)

BUG-8898: do not invoke timeouts directly

Request timeouts are occuring with the connection lock held,
at which point the connection can be at the tail of a successor
chain:

oldestConnection -> olderConnection -> connection

If the callback being invoked attempts to transmit an entry,
we will end up attempting to lock the entire chain. This would not
be a problem except that if there is a concurrent attempt to lock
the entire chain it ends up holding the lock of oldestConnection
and it is waiting for the lock on connection -- which will only be
released once the callback finishes executing, but that in turn
waits for oldestConnection to be unlocked -- a classic AB/BA deadlock.

This patch alleviates the problem by deferring callback execution
via executeInActor, i.e. the timeout will be delivered at as part
of normal message processing.

Change-Id: I237908cf214bcdfd477fe0212d09b207a0c2cdbf
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 4367f456f3c7a30c8ee9c7bca738b3e120a4e1d1)

Bug 8879: Migrate controller to the new XML parser

Migrate blueprint to the new XML parser from YANG tools.

Change-Id: Ib82da2f4b2b49dde7df78f425c700b9d3f473a26
Signed-off-by: Igor Foltin <igor.foltin@pantheon.tech>

Take advantage of default methods in DOMRpcProviderService

We can make one of the methods default, as all implementations
are exactly the same (codifying contract).

Change-Id: I64f0b62fd3a0987ed1ed01ec14b2c4d3b77560ac
Signed-off-by: Robert Varga <rovarga@cisco.com>

Make -it- parents use current odlparent version

Change-Id: I4fd29a6323eb9181f9629126a51ab578db1f5df2
Signed-off-by: Vratko Polak <vrpolak@cisco.com>

config-persister-impl: use lambdas

This series of patches uses lambdas instead of anonymous classes for
functional interfaces when possible. Lambdas are replaced with method
references when appropriate.

Change-Id: I20e8b07b839c168d0944c44a57602c3b9a96ce6a
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Slice front-end request messages

Added infrastructure to use the MessageSlicer to slice SliceableMessages in
the TransmitQueue.Transmitting class on the front-end. A MessageAssembler is
used on the Shard side to re-assemble. Currently only the
front-end ModifyTransactionRequest is a SliceableMessage as it contains a
NormalizedNode which can be arbitrarily large - the others are small and
don't require slicing.

Change-Id: I7b09e4864e19d3fdb215c2b9dbcb64c14b6a143c
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

blueprint: final parameters

This automatically-generated patch flags all appropriate parameters as
final (including caught exceptions).

Change-Id: I565047abcceb31a3da2ef8b2ebdee857e6623196
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Bug 8494: Separate writing and completion threads

If AbstractTransactionHandler uses only one executor thread,
future completion callbacks are delayed by throttling on writes.
CSIT aims to detect RequestTimeoutException within a narrow window,
so a separate executor for callbacks is used now.

The delay would not be that critical, but the problem is the timing
between a scheduled execution which exceeds scheduling gaps. These
seem to hold up normally-submitted tasks, leading to futures never
completing.

Therefore we use two Executors and synchronize state modification
call sites. Hence the two tasks (throttled producer) and future
completions can run concurrently (aside from state synchronization).

Change-Id: I642c5295ab6188b2d7e1b5feae62ab7ef52d41eb
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 8744119235b90d89021567e5f12361d98b823b8f)

BUG-8618: refresh transaction access when isolated

When we are isolated leader we stop accepting messages from
the frontend. If we remain in this state for more than 15 seconds
this can result in a timeout -- which is obvious, but it really
is our fault.

Since we cannot make forward progress anyway, there is no point
in purging the transaction. Update its access time with whatever
the last mark for that frontend was.

Change-Id: I9ff56c91e4fda4b68cd34c05609dc88d6d65fd32
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 1529bb8bdd4c30a782cf1574b0127833da5831b7)

BUG-8792: allow transactions to not time out after reconnect

During reconnect churn, the frontend may be catching up with previous
transactions, hence we should hold off timing it out until it does.

When we arrive at a timed out transaction, we allow the access time to
be updated to connect time -- effectively saying the transaction was
touched at the time of reconnect.

Change-Id: I3930b5782579f50931b204d8579c2aee51e2bc55
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 55661ed801e178812f16ac990c93b51a3d68c00e)

BUG-8619: do not touch forward path during purge enqueue

In case of a purge request, the request is sent from the head
of a connection chain (i.e. the original connection which created
the transaction) and propagated via forwarders. This path needs
to make sure it does not go via throttling, as it is an internal
detail.

Separate the transmit paths a bit more, so that TransmitQueue
can push messages to forwarders' replay path.

Change-Id: I5e146b8d11e8654b4beae3959207efb9c2f18315
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b83c7f5e5cdaee5f250988182dccb749ac7432c2)

BUG-8618: record LeaderFrontendState time

In order to deal with IsolatedLeader state and transaction timeouts,
we need to maintain an accurate view of when we have seen the frontend
even if we are not accepting messages from it.

Add correspoding field and maintain it whenever we interact with
LeaderFrontend state. Also record last connect ticks for the same use.

Change-Id: I8e49037507fcd01470a03be8c0d611efca55dabf
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 7633a2a50144dad7cf987b29959dc06509575c05)

Bug 3401: Remove/cleanup Import-Package in maven-bundle-plugin config

Some of the pom files don't need to explicitly specify Import-Package
in the maven-bundle-plugin configuration. Others were cleaned up to remove
unnecessary entries.

Change-Id: I6b9a741d1a110f17d371497e04e2ab2187aff6b6
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-7464: use yangtools.triemap

Yangtools is moving away from using upstreap Triemap to its
internal fork of that codebase. Switch this code, too.

Change-Id: I0d60ccc8927505a83a35631333203817484da9e0
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 8619: Introduce inheritance of progress trackers

+ Introduce cancelDebt method.
+ Use the newly introduced functionality in client code.
+ Delete unused copy constructors (including unit test).

Change-Id: Ib976343ed5f50c649ea08206c897cb70dead8b86
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 12b4928ef66a82f4a128a11701663ac23143c1d7)

Simplify QuarantinedMonitorActor

As per the comments, upstream has provided a dedicated event, hence
use that instead of digging inside akka internals.

Change-Id: I4731dfbbdd228d562ddd32ec5fd3d0e9af0855d0
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG-8143: issue a JVM restart

Instead of just restarting the OSGi framework, instruct karaf to
re-execute the JVM.

Change-Id: I10709f61b71d578e4677a5948c23e38f9871c6a1
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

ProgressTracker: Decrease delay due nearestAllowed

If nearestAllowed is in past, that means we have
a temporary interval of relatively small demand for tasks.
We can reduce delay, as if the time since nearestAllowed
was a "delay in advance".

This way the queue stays closer to the intended capacity.

Change-Id: I40f95ea9cb25ea62d8c65ee78cafc79e9b56cc11
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
(cherry picked from commit 80e6514d56cd4dc6aa40997dea2b460723148341)

BUG-8618: fix test driver

Since the test can produce bursts of completions, which in turn can
get slowed down by writout of new messages, offload future completion
to the executor we have internally. This in turn simplifies things,
as we can rely on state being manipulated (mostly) from a single thread.

Also change ArrayDeque to a HashSet to ensure removal of tasks completes
quickly even in face of misordered responses.

Change-Id: Ia5341633af2dbe3e26e7208436405daf7632a876
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 2be77b3bcef31ad8b6dbdce073471561d2cf76d6)

BUG-8618: add pause/unpause mechanics for tell-based protocol

When we are transitioning to/from paused state, we need to remove
all frontend-related state, including pending transactions, to ensure
ShardDataTree does not track them.

When we change to unpaused leader, we can reconstruct the state
from the journal -- the rest will be forwarded from the frontend anyway.

Change-Id: I28d486d1a6695e21dd7e6518609680d54e5a15eb
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 40d27d44d6f0b0358505b2e8ac5abbad25f47d4b)

BUG-8618: introduce RaftActor.unpauseLeader()

This is a preparatory patch, which notifies RaftActor when
the operation hooked to pauseLeader() fails to complete and the
leader should resume its normal operation.

This is needed to correctly resume operations of tell-based protocol
after a pauseLeader() completes without actually changing the leader.

Change-Id: Ia00e52ebb327575a484af62bf0c31131a33303b3
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 3a10a45e0f78337435c8bc84015c4724a9fa7741)

BUG-8618: eliminate SimpleShardDataTreeCohort subclasses

Now that we handle pre-cancommit failures useing reportFailure(),
there is no need to have specialized subclasses for cohorts, as
the initial failure can cleanly be handled via nextFailure.

This also places a guard in reportFailure() so we do not override
a failure once it is set -- which should only happen in the case
of a dead-on-arrival transaction and it timing out in READY state.

Change-Id: I057c5b36006843f51d60034d30af83bac4e02cd7
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 2783c9dffdd91dae87d3351f4ebffbd8679e3133)

BUG-8618: rework AbstractProxyTransaction.flushState()

Instead of directly forwarding state use ModifyTransactionRequest
to encapsulate state and forward it separately to the successor.

This eliminates sendRequest() from replay path, ensuring the replay
thread is not blocked.

Change-Id: Ice86791d417b7487b9d3b1df06341dd028cde7f8
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit c525e5f25b951daa28d0cbde237ba3040b68f99f)

BUG-8618: reconnect connections more aggressively

Given that the timeout period on backend for an existing transaction
is 15 seconds, sleeping for 5 seconds between reconnect attempts seems
excessive. Lower the timer to 1 second, which should give us a slightly
better chance to avoid timeouts.

Change-Id: Ib74480f5630865cb7a11ca7027e0495443d1d14e
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 70f287502823bab284555b52b91043c3204b829b)

BUG-8618: turn timeouts in READY state into canCommit failures

This patch adds more details to the TimeoutException reported when
we prune a transaction while it is in the queue. It also peels the
READY case from the defaults and makes sure we send an authoritative
reply back to the frontend when it requests the transaction to be
committed.

Change-Id: I21364ff7e7103af8be6988b8483adc112c3c1d25
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 0d5408c4babc902d270d9f81ed53c6af93bb2867)

BUG-8618: improve logging

While target sequence is important, we also need to log transmit
sequence, too.

Since this issue involves a state mismatch on the backend, improve
ShardDataTreeCohort logging to include transaction identifier
and state.

Change-Id: I21735870a9ae7983dc14a8f8f4d7464d3448ca60
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit d2d9179e52a0d87aae2b9014b4c36384e24692e3)

Bug 7449: Slice ReadTransactionSuccess response

Added slicing of the ReadTransactionSuccess message. The slicing is
initiated by the Shard usung a MessageSlicer and re-assembly is done
by the ClientActorBehavior on the FE. Introduced a SliceableMessage
interface implemented by ReadTransactionSuccess which Shard checks for
to determine if the response message should be sliced.

Change-Id: Ie55e35aa82a9d2bc21f7a8f24396cb4df467252e
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 8163: getDataTreeChangeListenerExecutor() & DataBrokerTestModule

Adjust AbstractDataBrokerTestCustomizer with a
getDataTreeChangeListenerExecutor() instead of a
setDataTreeChangeListenerExecutor(), just for more consistency with the
existing getCommitCoordinatorExecutor() method. Also less confusing (to
me) than seeing the private Executor set by default which may get
changed by the setter later.

Adjust ConcurrentDataBrokerTestCustomizer with the
useMTDataTreeChangeListenerExecutor as a constructor argument, instead
of an useMTDataTreeChangeListenerExecutor() method, just for consistency
for how you already have it in AbstractConcurrentDataBrokerTest.

Extend ConstantSchemaAbstractDataBrokerTest and DataBrokerTestModule to
allow passing through this new opt-in tweak flag, so that tests in
downstream projects such as genius and netvirt can staring exploring
enabling this.

Change-Id: I4ad85ac48163d2f4bac865f46a3b047d5b7d333a
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Bug 8163: Use MT DTCL executor in AbstractConcurrentDataBrokerTest

Using a direct executor can cause deadlocks so the DTCL executor was
made configurable to use a threadpool as an opt-in. Direct executor
is still the default as many existing tests would break.

Change-Id: I41e14f1e6d3b77a44e61dfc75abff29d11a777dc
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Refactor Follower#handleAppendEntries

This method is large - refactor it a bit.

Change-Id: Idae1883accdd7c73b57471501e66398306cf6e91
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Remove <version> from org.apache.aries.blueprint.core dep.

Because in c/58365a the latest version was added to oldparent
dependencyManagement, and it seems wiser to declare this in only a
single place, to avoid possible version discrepancy problems re. this
artifact in the future.

Change-Id: If4370f3c7e80123fe6c225a11c1224145b3ad2b9
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

BUG-8676: add UnsignedLongRangeSet

This patch adds the wrapper class and updates users to use it directly.
The implementation itself is not changed, that will be done in a follow-up
patch.

Change-Id: Ie240ca5c3c9fc1448629bb5db6ecfa1029f66b8f
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Cleanup warnings

- pom.xml groupId duplicate
- Futures.addCallback()
- Throwables.propagate*()
- pontentially-static methods
- remove 'throws Exception' where it is not really needed

Change-Id: Ib47e6255e0f510ab7dd0dcd08f71f2dd124df7b7
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Fix Verify/Preconditions string format

These methods take a String.format() string, not a logging one, hence
we are not getting the information we want.

Change-Id: I46de0d64c85594e3d7b8be97951f1cf5249bca8f
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 3e9ac68fea1aef0c7fedec346e50882efdde8acc)

Bug 6794: Remove threadpool config modules

With the removal of the CSS netconf connector, the threadpool config modules
are no longer used so remove them. The *Wrapper classes are used via blueprint
so they remain. Also removed the eventbus config modules which were deprecated
in Carbon.

Change-Id: Ic528e5817a9f5ccdb67ef41987128ead4db51cbd
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Upgrade to odlparent 2.0.2

Change-Id: I748830e39c108056ecd81809a0556e8c43d251f4
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Speed up slow tell-based Distributed*IntegrationTest cases

Some test cases for tell-based take up to 2 minutes due to the default
2 minute request timeout and 30 sec backend aliveness timer. Speed up
the tests by setting much lower values.

Change-Id: If8dba80d625ded8753178e937f14b435675ef0e4
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Make AbstractClientConnection timeouts configurable

So we can tweak them in production and unit tests.

Change-Id: I39ce8cdf3cd5397a71f52c42357943dfe5eccb7c
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Fix incorrect spelling of fileBackedStreamFactory

filedBackedStreamFactory should be fileBackedStreamFactory.

Change-Id: Ib0b65d68d37c5b0ded4f1739d4ddc578973fe6ec
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-4513 UT for Change event is empty when homogenious composite key is used

The original DCL test passes now so whatever the issue was before has been
fixed. I also added a DTCL test.

Change-Id: I5c5037f49a77835dbfea1ce9db8b22d03b6191ec
Signed-off-by: Valentin Mayamsin <vmayamsi@cisco.com>

Transfer leadership in PreLeader state

Rather than dying when requested to shutdown in pre-leader state,
follow the same code path we perform in normal leader mode, i.e.
transfer leadership.

Change-Id: I2ca30d44626df05c5f8b5ff6984eea20c7bf0949
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Explicitly load the real DataBroker with component-name

It seems that karaf4 has "better" wiring so the
TracingBroker was being wired to itself, resulting
in stack overflows.

Change-Id: Iedb2e9dcfd53acf384ed3130cfcd78f313d76e1e
Signed-off-by: Josh <jhershbe@redhat.com>

Re-enable karaf distribution

This re-enables opendaylight distribution build to get us back
on par.

Change-Id: I11e5ee4d1f9f9de716f5636ac9afbad0137c93fc
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bump odlparent dependency to 2.0.1

Bumps odlparent to latest release.

Change-Id: Ifaf36c6539206ec5c35663717b691a0d962d1744
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 7449: Add slicer Id to MessageSliceIdentifier

Both Shard and RaftActor (via AbstractLeader) (will) have separate
MessageSlicer instances and we need to determine to which instance
MessageSliceReply messages should be forwarded otherwise the first
MessageSlicer will drop messages destined for the second MessageSlicer.
Therefore add a slicerId field to MessageSliceIdentifier which is
checked by MessageSlicer#handleMessage.

Change-Id: Ib39ede29789d5bfaf1fdaea66a8d2994fe6ebcd6
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8704: rework seal mechanics to not wait during replay

AbstractProxyTransaction.seal() and most notably internalSeal()
can end up pushing down messages down the connection hence they
can end up slowing down the replay process.

The replay paths end up enqueing subsequent requests anyway, so
rework the structure to split the 'seal only' and 'seal and flush'
codepaths.

Change-Id: Ie75c1ef8aa0d3d5d7ca482d383fd516077ca50b4
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 1e07329c0d800b8fea43ae0c4060aded5fd18739)

Bug 8768: Close itemProducer for every code path

Change-Id: Ib87de13e2a0e6f128f74a05b80ffb4331e345d2c
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
(cherry picked from commit 35b7e595945a1386047c1af73c94b70fbdaf9a59)

BUG-8494: rework AbstractTransactionHandler

If we have a transaction failure while we are producing transactions,
we could end up adding a delay until the failure is detected as we
would continue jamming in transactions.

Rework internal logic to halt processing as soon as a failure is seen,
speeding up detection and simplifying code.

Change-Id: I19d13c78d94bb39481abde477ec4e3df03a6aa57
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b7657c3ac7b4697372674b75e820581a6d59e2ba)

BUG-8494: fix failure path thinko

The check should be to see if the failure has *not* been set,
hence invert the check.

Change-Id: I2c3893924f1c985687beedbfae0889388fad15c7
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 5e986f5320c561953759a7beffb11db7e296817c)

BUG-8445: check sessionId before propagating failures

When we have leader movement ocurring, based on timing details we
can re-establish a connection to the new leader and then start
receiving responses from the old leader telling us it no longer
is the leader.

To stop this from happening we need to check connection session ID
against the incoming failure.

Change-Id: If9a891016c7f213f2552283e3ec13485e598f5a4
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 1c495bceb8d9c203f5ce53ea1ab9d907efb4d7b3)

BUG-8494: Cleanup clustering-it-provider

Fixes various warnings and refactors MdsalLowLevelTestProvider
to be slightly cleaner in terms of number of classes.

It also eliminates synchronous thread blocking on future collection
and instead schedules task which performs the cleanup if the system
gets stuck.

Change-Id: I657f3df60c620284538bdf39ab1536eac8448801
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit d97061af6814ad7b085af10797a252aa4aa5cda6)

Cleanup ProduceTransactionsHandler

Shuffle invariants around to reduce overheads. Also adds better debugs
around futures completing.

Change-Id: I01f940de08e9e0b7fc0e95b48b2d5fecdfd78f86
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 9797fc8e587a51395342586bc44de9750fb67af3)

BUG 8604 set proper tag when producer creation times out

Change-Id: I405f4d546a32b2d0f5b56fb03907a63334fabd6c
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit ec734245413c94cdd758f4c22ad3f3b63cfae5e6)

BUG 8494 log possibly hanged futures in tx handlers

Change-Id: Iccc90e575033c6770a3a499853f31e0684a712e4
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 0723037074588cb901212e9b3ad9bf437e754f89)

Catch all exceptions when submitting in tx handlers

Change-Id: I5b9a2ec26b1b6001423f2cf5cf57285ce6c7e340
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 31a52c56cb4e8398403f299d0c3d3830084e260e)

BUG-8620: handle direct commit and disconnect correctly

Transactions committed directly can complete in a disconnected
fashion as we are skipping the back-and-forth communication of the
three-phase commit. This period may involve shard leadership changes
and so we may end up in a situation where we are replaying a direct
commit request to a transaction which already completed -- which
raises a RequestFailure to make sure we do not do anything untoward.

In the specific case of direct commit, though, this is perfectly fine
and so update the callback to account for this case happening.

Change-Id: Ic60e69f0f58cc7c5a3ac869386dc12f856aa1f74
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit da42d2ffc8904b8dd24596cf6d918a0d30c8c521)

BUG 8602: Skip initial fill of idints

Change-Id: If197c9b2318a52b3608f6065bea44af860a09849
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 09630b9ae171a976301a795e745044ae58812df7)

Bug 2890: Chunk AppendEntries when single payload size exceeds threshold

Utilizes the MessageSlicer in AbstractLeader to slice/chunk AppendEntries
messages whose single log entry payloas exceeds the max size threshold.
The MessageAssembler is used in the Follower to re-assemble.

For efficiency, with multiple followers, the AbstractLeader reuses the
FileBackedOutputStream containing the serialized AppendEntries data.
However, since the MessageSlicer takes ownership of the FileBackedOutputStream
and cleans it up when slicing is complete, I added a
SharedFileBackedOutputStream class that maintains a usage count and
performs cleanup when the usage count reaches 0. The AbstractLeader maintains
a Map of SharedFileBackedOutputStream instances keyed by log index.

The FollowerLogInformation keeps track of whether or not slicing is in
progress for the follower. Same as with install snapshot, we only want to send
empty AppendEntries as heartbeats.

Change-Id: Id163944b9989f6cb39a6aaaa98d1f3c4b0026bbe
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Improve ShardBackendInfo.toString()

Slight update to eliminate a space from the property name and
an explicit present/absent string.

Change-Id: I9cb3a57049737c8ea25d22263140ff9974e23502
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 741013a2d48a4d08f83082c4e3cff79f59d17dde)

BUG-8445: ignore responses from mismatched sessions

We have to check the session ID of the response in order not to
wreck transmit consistency if face of leader changes and reconnects.

If we reconnect the connection to the new leader before we saw all
responses from the old leader, we end up in a situation where the
old leader completes some of the replayed messages before we either
send them to the new leader or receive (the correct) reply.

Guard against this by checking the session ID before attempting to
pair a response to a request.

Change-Id: I28fa98b89c679715c3a0c546962d00533e76aa5d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 0ea09c71a5902f1ebf27ad683be634ded773e2c7)