git.opendaylight Code Review - controller.git/log

Transfer leadership in PreLeader state

Rather than dying when requested to shutdown in pre-leader state,
follow the same code path we perform in normal leader mode, i.e.
transfer leadership.

Change-Id: I2ca30d44626df05c5f8b5ff6984eea20c7bf0949
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Explicitly load the real DataBroker with component-name

It seems that karaf4 has "better" wiring so the
TracingBroker was being wired to itself, resulting
in stack overflows.

Change-Id: Iedb2e9dcfd53acf384ed3130cfcd78f313d76e1e
Signed-off-by: Josh <jhershbe@redhat.com>

Re-enable karaf distribution

This re-enables opendaylight distribution build to get us back
on par.

Change-Id: I11e5ee4d1f9f9de716f5636ac9afbad0137c93fc
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bump odlparent dependency to 2.0.1

Bumps odlparent to latest release.

Change-Id: Ifaf36c6539206ec5c35663717b691a0d962d1744
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 7449: Add slicer Id to MessageSliceIdentifier

Both Shard and RaftActor (via AbstractLeader) (will) have separate
MessageSlicer instances and we need to determine to which instance
MessageSliceReply messages should be forwarded otherwise the first
MessageSlicer will drop messages destined for the second MessageSlicer.
Therefore add a slicerId field to MessageSliceIdentifier which is
checked by MessageSlicer#handleMessage.

Change-Id: Ib39ede29789d5bfaf1fdaea66a8d2994fe6ebcd6
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8704: rework seal mechanics to not wait during replay

AbstractProxyTransaction.seal() and most notably internalSeal()
can end up pushing down messages down the connection hence they
can end up slowing down the replay process.

The replay paths end up enqueing subsequent requests anyway, so
rework the structure to split the 'seal only' and 'seal and flush'
codepaths.

Change-Id: Ie75c1ef8aa0d3d5d7ca482d383fd516077ca50b4
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 1e07329c0d800b8fea43ae0c4060aded5fd18739)

Bug 8768: Close itemProducer for every code path

Change-Id: Ib87de13e2a0e6f128f74a05b80ffb4331e345d2c
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
(cherry picked from commit 35b7e595945a1386047c1af73c94b70fbdaf9a59)

BUG-8494: rework AbstractTransactionHandler

If we have a transaction failure while we are producing transactions,
we could end up adding a delay until the failure is detected as we
would continue jamming in transactions.

Rework internal logic to halt processing as soon as a failure is seen,
speeding up detection and simplifying code.

Change-Id: I19d13c78d94bb39481abde477ec4e3df03a6aa57
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b7657c3ac7b4697372674b75e820581a6d59e2ba)

BUG-8494: fix failure path thinko

The check should be to see if the failure has *not* been set,
hence invert the check.

Change-Id: I2c3893924f1c985687beedbfae0889388fad15c7
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 5e986f5320c561953759a7beffb11db7e296817c)

BUG-8445: check sessionId before propagating failures

When we have leader movement ocurring, based on timing details we
can re-establish a connection to the new leader and then start
receiving responses from the old leader telling us it no longer
is the leader.

To stop this from happening we need to check connection session ID
against the incoming failure.

Change-Id: If9a891016c7f213f2552283e3ec13485e598f5a4
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 1c495bceb8d9c203f5ce53ea1ab9d907efb4d7b3)

BUG-8494: Cleanup clustering-it-provider

Fixes various warnings and refactors MdsalLowLevelTestProvider
to be slightly cleaner in terms of number of classes.

It also eliminates synchronous thread blocking on future collection
and instead schedules task which performs the cleanup if the system
gets stuck.

Change-Id: I657f3df60c620284538bdf39ab1536eac8448801
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit d97061af6814ad7b085af10797a252aa4aa5cda6)

Cleanup ProduceTransactionsHandler

Shuffle invariants around to reduce overheads. Also adds better debugs
around futures completing.

Change-Id: I01f940de08e9e0b7fc0e95b48b2d5fecdfd78f86
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 9797fc8e587a51395342586bc44de9750fb67af3)

BUG 8604 set proper tag when producer creation times out

Change-Id: I405f4d546a32b2d0f5b56fb03907a63334fabd6c
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit ec734245413c94cdd758f4c22ad3f3b63cfae5e6)

BUG 8494 log possibly hanged futures in tx handlers

Change-Id: Iccc90e575033c6770a3a499853f31e0684a712e4
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 0723037074588cb901212e9b3ad9bf437e754f89)

Catch all exceptions when submitting in tx handlers

Change-Id: I5b9a2ec26b1b6001423f2cf5cf57285ce6c7e340
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 31a52c56cb4e8398403f299d0c3d3830084e260e)

BUG-8620: handle direct commit and disconnect correctly

Transactions committed directly can complete in a disconnected
fashion as we are skipping the back-and-forth communication of the
three-phase commit. This period may involve shard leadership changes
and so we may end up in a situation where we are replaying a direct
commit request to a transaction which already completed -- which
raises a RequestFailure to make sure we do not do anything untoward.

In the specific case of direct commit, though, this is perfectly fine
and so update the callback to account for this case happening.

Change-Id: Ic60e69f0f58cc7c5a3ac869386dc12f856aa1f74
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit da42d2ffc8904b8dd24596cf6d918a0d30c8c521)

BUG 8602: Skip initial fill of idints

Change-Id: If197c9b2318a52b3608f6065bea44af860a09849
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 09630b9ae171a976301a795e745044ae58812df7)

Bug 2890: Chunk AppendEntries when single payload size exceeds threshold

Utilizes the MessageSlicer in AbstractLeader to slice/chunk AppendEntries
messages whose single log entry payloas exceeds the max size threshold.
The MessageAssembler is used in the Follower to re-assemble.

For efficiency, with multiple followers, the AbstractLeader reuses the
FileBackedOutputStream containing the serialized AppendEntries data.
However, since the MessageSlicer takes ownership of the FileBackedOutputStream
and cleans it up when slicing is complete, I added a
SharedFileBackedOutputStream class that maintains a usage count and
performs cleanup when the usage count reaches 0. The AbstractLeader maintains
a Map of SharedFileBackedOutputStream instances keyed by log index.

The FollowerLogInformation keeps track of whether or not slicing is in
progress for the follower. Same as with install snapshot, we only want to send
empty AppendEntries as heartbeats.

Change-Id: Id163944b9989f6cb39a6aaaa98d1f3c4b0026bbe
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Improve ShardBackendInfo.toString()

Slight update to eliminate a space from the property name and
an explicit present/absent string.

Change-Id: I9cb3a57049737c8ea25d22263140ff9974e23502
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 741013a2d48a4d08f83082c4e3cff79f59d17dde)

BUG-8445: ignore responses from mismatched sessions

We have to check the session ID of the response in order not to
wreck transmit consistency if face of leader changes and reconnects.

If we reconnect the connection to the new leader before we saw all
responses from the old leader, we end up in a situation where the
old leader completes some of the replayed messages before we either
send them to the new leader or receive (the correct) reply.

Guard against this by checking the session ID before attempting to
pair a response to a request.

Change-Id: I28fa98b89c679715c3a0c546962d00533e76aa5d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 0ea09c71a5902f1ebf27ad683be634ded773e2c7)

Remove EmptyQueue

This class is not used anywhere, remove it. If this functionality
is needed somewhere, use yangtools.util.EmptyDeque instead.

Change-Id: I12414fd2a2a5b4e7ac8b73fe70e8aa3dc929d025
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 7449: Move Dispatchers to sal-clustering-commons

Moved Dispatchers to a common package so it can be used in
cds-access-client.

Change-Id: I4cea4c586dded9e413c1feee698b04d981b19ea2
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 7449: Introduce ClientActorConfig in cds-access-client

Upcoming changes in cds-access-client will need access to some config
params in DatastoreContext. However DatastoreContext is in
sal-distributed-datastore and thus can't be referenced in cds-access-client.
So refactor a ClientActorConfig interface with the necessary accessors.

Change-Id: I55e7291340e711c585f4fb1236a27396503d1914
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 7449: Add custom dispatcher for message serialization/slicing

This will be used in subsequent patches to parallelize message
message serialization/slicing.

Change-Id: I6b89e1a61e10d743b24d834807d5a27bfc6e5c2b
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG 8629: Try to allow notification processing to finish in unsubscribe of listeners.

Change-Id: I8638c6066b86b101484d3d80cd0fed146a478778
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit bc5486e6d9fab8f550be8b72874ce96a9eb52651)

Bug 8740: startup archetype should default to Yang 1.1

Default to yang 1.1 and showcase the awesome work our yangtools
developers put in for 1.1 support.

Change-Id: I5f8320baca375f4e57f036840b27d7270191f530
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Bug 8739: Autogenerate date for yang in archetype

A couple of lines to autogenerate the date based on the day that
the archetype is run. The date is then formatted appropriately
and used as the yang revision. This helps keep the yang data
model writer honest.

Change-Id: Ic885b71a777119702b3ce78a21623298c44ad9c1
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Bug 8735: Remove dlux deps from startup archetype

Since dlux is in maintenance mode, remove the dlux dependency from
the opendaylight-startup-archetype and remove the corresponding
-ui feature.

Change-Id: I78f43b90d718a10e47de99d54cd4952acc699470
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Bug 7449: Add maximum-message-slice-size config param

Added a new maximum-message-slice-size config param that will be used
when fragmenting messages thru the akka remoting framework. This is
a generalized version of the shard-snapshot-chunk-size param and
replaces it.

Change-Id: I4dc4cc0de92d6f876e5587cd8cb3ade2abb59285
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8621 - Add shutdown-prefix-shard-replica rpc to MdsalLowLevelTestProvider

csit testing scenarios require clean shutdown of shard's local replica
funcionality. This introduces shutdown-prefix-shard-replica rpc to
MdsalLowLevelTestProvider. Upon invoking this rpc, local replica of
specified prefix-based shard is gracefully stopped.

Change-Id: I620b7ae2dbc9978dd155c64f703d421d46108e3d
Signed-off-by: Jakub Morvay <jmorvay@cisco.com>
(cherry picked from commit bdf02e09c13b6c8d170202054d44877707642cd9)

Bug 8621 - Add shutdown-shard-replica rpc to MdsalLowLevelTestProvider

csit testing scenarios require clean shutdown of shard's local replica
funcionality. This introduces shutdown-shard-replica rpc to
MdsalLowLevelTestProvider. Upon invoking this rpc, local replica of
specified module-based shard is gracefully stopped.

Change-Id: Ia8e0be65ecc99f9e208ff4ffd737b210437a9f51
Signed-off-by: Jakub Morvay <jmorvay@cisco.com>
(cherry picked from commit d5fcf5d66568519595b533cc20651634d66d34fb)

Bump to odlparent 2.0.0

This takes odlparent 2.0.0, adjusts for guava update and feature
movement. Since jenkins is failing on the distro run, that is
disabled and a follow-up patch will re-enable it to get us going
again.

Change-Id: If3e1289ed7f73a79a5a47428c634bda9702e824d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG-8494: propagate submit failure immediately

Rather than waiting for abort to complete, which cannot happen
during isolation for example, propagate timeout immediately.

Change-Id: I90333938cb951f3b478320c682c65be219660fdf
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b6a43d9e31f6300fe35a27ecd1830a044b7cceb9)

Retrofit DOMSchemaService into SchemaService

For migration purposes we need to retrofit SchemaService so that
in extends MDSAL's DOMSchemaService. Also allow datastores to
be instantiated with DOMSchemaService.

Change-Id: Ie71732fb09f4da6dbc2d0819931d5ade2356d6f2
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 4290 related: Remove <version> from org.osgi.service.event dep.

Because in c/58362 that was added to oldparent dependencyManagement, and
it seems wiser to declare this in only a single place, to avoid possible
version discrepancy problems re. this artifact in the future.

Change-Id: Ic1062efa211c6c5be4ece11a16c0191489d0bf42
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Fix (and suppress) some static code analysis warnings in blueprint

Change-Id: I39a2c07176406d469f18949b30abb3deb3c21f6c
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

BUG-6709: Remove odl-karaf-empty

Projects should be using the odlparent version instead.

Change-Id: Ief85bf7e35aea5c9e85a07bdae8d47c29219ccdc
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Bug 7449: Add message slicing/re-assembly classes

Added re-usable classes for slicing messages into smaller chunks and
re-assembling them. The process is similar to the current raft
install snapshot chunking.

The 2 main classes are MessageSlicer and MessageAssembler. The basic workflow
is:

- MessageSlicer#slice method is called which serializes the message,
   creates and caches a SlicedMessageState instance and determines the number
   of slices based on the serialized size and the maximum message slice size,
   then sends the first MessageSlice message.
- The MessageAssembler on the other end receives the MessageSlice, creates an
   AssembledMessageState instance if necessary and appends the byte[] chunk to
   the assembled stream. A MessageSliceReply is returned.
- The MessageSlicer receives the MessageSliceReply and sends the next
   MessageSlice chunk, if necessary.
- Once the last MessageSlice chunk is received by the MessageAssembler, it
   re-assembles the original message by de-serializing the assembled stream
   and notifies the user-supplied callback (of type Consumer<Object>) to handle
   the message.

Both MessageSlicer and MessageAssembler use a guava Cache and can be configured
to evict state that has been inactive for a period of time, ie if a message hasn't
been received by the other end.

The MessageSliceReply can propagate a MessageSliceException. If the
MessageSliceException indicates it's re-triable, the MessageSlicer will restart
slicing from the beginning. Otherwise slicing is aborted and the user-supplied
failure callback is notified.

Change-Id: Iceea212b12f49c3944bade50afded92244e4b31a
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8445: Guard against NPE

We have observed this NPE:

[...]
Caused by: java.lang.NullPointerException
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:810)
        at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:105)
        at org.opendaylight.controller.cluster.datastore.ChainedCommitCohort.canCommit(ChainedCommitCohort.java:58)
        at org.opendaylight.controller.cluster.datastore.FrontendReadWriteTransaction.directCommit(FrontendReadWriteTransaction.java:384)
        at org.opendaylight.controller.cluster.datastore.FrontendReadWriteTransaction.handleModifyTransaction(FrontendReadWriteTransaction.java:527)
        at org.opendaylight.controller.cluster.datastore.FrontendReadWriteTransaction.doHandleRequest(FrontendReadWriteTransaction.java:174)
        at org.opendaylight.controller.cluster.datastore.FrontendTransaction.handleRequest(FrontendTransaction.java:141)

Which is quite weird, as the FrontendReadWriteTransaction state seems
to indicate the transaction is ready to be committed, yet ShardDataTree
does not seem to have a record of it.

While we are investigating the root cause, this patch adds an explicit
warning when this happens.

Change-Id: I2ddff76357c33d7df2b3f25a2703c69715fbd871
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit da09aa70325ef2c6f34de73b7cf10fc63410ace2)

Lower UnboundedDequeBasedControlAwareMailbox logging

Using debug logging seems excessive, leading to a lot of messages
at debug level. I think we can downgrade to trace instead.

Change-Id: I2a7f87760a1eefe9794eac3b4025b6a3891c30a3
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 27193873ccddbdc8126a24cdb9b0536c5e98ae5f)

Enforce non-null entries field in AppendEntries

The List<ReplicatedLogEntry> entries field has to be non-null
so enforce it with PreCondition. Same with leaderId. Callers
of getEntries do not need to check for null.

Change-Id: I5cd404d54e453e43456952ea2e11ea7f8f1c626c
Signed-off-by: Tcm Pantelis <tompantelis@gmail.com>

Convert message-bus to blueprint

Change-Id: I2dcabedf8a5fa05ca7433573c4d1957884154161
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Optimize Follower.isOutOfSync()

This is a fast-path method which does a few duplicate checks
and calculations that may end up being unnecessary.

Restructure it so we check each partial condition just once
and compute required inputs only when we are going to need them.

Change-Id: I67a0089693a2ba1cd8c06c43504266534090545b
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit d3c5dc3b0f6bea3fa1c2f964353b87d1a9fcaef8)

BUG-8618: update sync status only after processing

Since the commitIndex may move in chunks we really want to update
our sync status after we have gone through the AppendEntries message
so our commitIndex reflects the state after processing.

Change-Id: I49c72a21f8d9c3efb7ae9cc1b64276220057f2e2
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit bdb818fbfc5f015ab14883348f170cca8ce79128)

BUG-8618: make sync threshold tuneable

We are observing quite a few of these transitions, which may be coming
from batching scenarios. Introduce sync-index-threshold config knob
to expose control over it.

Change-Id: Ief4c89c2fe5b95cebaf3fb83cbcdda37cac126b6
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 890e4bbf40aee318a2174bd4130cf34437e5617b)

BUG-8618: improve debug logs

We can have a reasonable ID prepended, add that. Also improve range
of threshold parameter, as we are addressing journal entries here.

Change-Id: I86aac1be04df8b72bfa6ffaa2b7a7e3b4cbfad6e
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 22e817f688bb73420ddcac1f20cf71379ff3a508)

Log data after in IdIntsDOMDataTreeLIstener

At TRACE level.

Change-Id: Ic71aeec4c121d5cfb53a09762c9845e3e94f4f04
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
(cherry picked from commit 8e1dd830bcc8f0b5d34192ebf9a8d45a165a90b1)

BUG-8618: refactor SyncStatusTracker state

Introducing a leader target encapsulation allows us to
enfore state transitions (i.e. state is guaranteed to be
non-null when we need its bits).

This enables us to eliminate the need for a magic constant.

Change-Id: Iab7178694edc3c62032e32c4386c371630f67b6f
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 2c42c1d35a45fb7eced5a3ffd07b52bdc26f7e40)

BUG-8618: make sure we refresh backend info

When we are performing a reconnection attempt we must never use
previous backend info, but rather have to refresh it.

Fix this by removing state when resolution fails.

Change-Id: I65592f2101547a606a15d9c8030c7d8c58afe8a5
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit a816f13940862759cefcd6e69330c6a70b512ee2)

BUG-8618: add threshold crossing debugs

We are observing messages about sync status changing on the order
of 10s a second (14ms between messages). This looks awfully like
inter-node latency, hence it needs to be tuneable.

We do not have an understading of what sort of jumps are we talking
about, so add logging to the source of this events at debug, so these
can be diagnosed.

Change-Id: I9e2d78629f8808914cdb664cb28afcd47a55ee80
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 97875ef2635a536c0663750503ea6e64486d2fe6)

Improve timeout message

Rather than reporing nanoseconds, convert them to fraction seconds.

Change-Id: I9052462990f8c6b99349ed123f682ce3f0e23461
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 715bf60ac1899a3c01690d244d26b12c9212ecc7)

Pull in pax-exam-features

BGPCEP downstream is using mdsal-it-base and it is failing to bring
up the container, as the features are not present in local repo.

Add the appropriate declaration, which fixes the breakage downstream.

Change-Id: Id3c21addb628ff37fb50e705773b9e4db83d17bc
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Migrate to odlparent 1.9.0

Change-Id: I7d4af74e7713d48dd6ad8431229c4963423abbf6
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Remove filter-valve

Change-Id: I9d6449a1e6337626373896e4766ea5a71ccee8ff
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8665: fix memory leak around RangeSets

This is a thinko on my part, where I was thinking in terms of a
discrete set (UnsignedLong) and assumed RangeSets will coalesce
individual items.

Unfortunately TreeRangeSet has no way of knowing that that the
domain it operates on is discrete and hence will not merge invididual
ranges.

This patch fixes the problem by using [N,N+1) ranges to address
the problem. A follow-up patch should address this in a more
efficient manner.

Change-Id: Iecc313e09ae0cdd51a42f7d39281f7634f0358a7
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG 8649: remove bounded mailbox from ShardManager and notification actors

Change-Id: I52975d969a81cc3ccdc7b963e0f43f9958ba6a10
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 8a298158f0164f581f6d47ae505c76a2cb7e3771)

Fix intermittent PreLeaderScenarioTest failure

java.lang.AssertionError: AppendEntries - # entries expected:<1> but was:<0>
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:743)
  at org.junit.Assert.assertEquals(Assert.java:118)
  at org.junit.Assert.assertEquals(Assert.java:555)
  at org.opendaylight.controller.cluster.raft.PreLeaderScenarioTest.testUnComittedEntryOnLeaderChange(PreLeaderScenarioTest.java:57)

AppendEntries appendEntries = expectFirstMatching(follower1CollectorActor,
                AppendEntries.class);
assertEquals("AppendEntries - # entries", 1, appendEntries.getEntries().size());

After the payload is sent to the leader, it expects an AppendEntries sent to follower1
with a single ReplicatedLogEntry. From the test output this did occur correctly but
the MessageCollectorActor still had the initial empty AppendEntries sent on leader
startup. The test setup waits for the initial AppendEntriesReply's from both followers
prior to clearing messages in each MessageCollectorActor however the AppendEntries may
not have been delivered to follower1's MessageCollectorActor yet and thus doesn't get
cleared. We need to specifically wait for the AppendEntries in follower1's
MessageCollectorActor.

Change-Id: I638a21e75ea135c1fe24970135f564da4fc5738e
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 8606: Continue leadership transfer on pauseLeader timeout

Modified it to continue with leadership transfer if pauseLeader times out
instead of aborting. The shard may have a lot of transactions queued up
which it can't finish in time but there may still be a follower that is
caught up (ie whose matchIndex equals the leader's lastIndex) or would be
caught up if leadership transfer continued. Worst case is no follower is
available and the "catch up" phase of leadership transfer also times out
which would lengthen shut down time but that should be fine.

Change-Id: I1ec1ef43bb556e50416bb7239ce3c267265db9b3
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

yang-test POM simplification

Motivated by review https://git.opendaylight.org/gerrit/#/c/58714/, and
realizing that the odl-license version in this POM will cause issues
everytime we'll upgrade odplarent (which may be more frequent, soon).

This is effectively a counter proposal to c/58714, as on looking at this
it seems instead of bumping that version, we can remove most of what's
causing this (without loosing anything; CS still runs, not that we
care), becaues most of this is is inherited in Maven, the point of that
configuration is just to skip the VS validation which this project for
some reason exceptionally gen. into src/main instead of into target/.

The ${lifecycle.mapping.version} is replaced by fixed 1.0.0

Change-Id: Ic78c35a940dc7ee5fe25f3527529591e5c5214f7
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

BUG 8629: log inconsistent notifications as warn

Change-Id: I872fced1ecc913e521ddf7c0d4acee7f48b04cb1
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 2c510d40d2813e5f65f755e53e5ef570b2ac5fba)

BUG 8618: Log leader status when rejecting request

Change-Id: Iecd99a74473b68f43b7ad43a1272d679aa09b4e6
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 0849bf398f71419ac124a1dddacf6dbd40775426)

Do not flood logs with modifications

Debugging logs have grown quite a bit for tell-based protocol
mostly due to us dumping modifications as part of the request
message. Log only the number of modification in the message,
which will make the logs quite a bit more readable.

Change-Id: I35961702b7bdd0e3f93cd03f05a0e443a14bf419
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 11b30d7680da427f78188dd841c5d6509c12ef33)

Bug 8662 RefreshingSCPModuleInfoRegistry synchronized

Looks like (as seen on Bug 8662) there can be race conditions between
RefreshingSCPModuleInfoRegistry's updateService() and close() ... The
NPE in that bug on the "osgiReg" happens even if though that's guarded
inside a if (this.osgiReg != null), but it's during shutdown, so
presumably it JUST got set to null in close() ? A "synchronized" on
both methods should prevent that.

Change-Id: I607092e3174c2fd0447d8548f4933624acd6a29b
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Switch from config-parent to binding-parent in archetype resource poms

Change-Id: Ibbebaf67e14da2cc958db05a01ca44fdff074bc7
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in mdsal-trace

Change-Id: Ieaca84db2ff205b6717cd2d37443459b75bdc7fa
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in mdsal-it-parent

Change-Id: I4f37d6dfa1141efa0e34cb0b4f2863a7e5938efa
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in mdsal-it-base

Change-Id: I0d3393fae51408b857b896fe3ff9736f67d868aa
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to bundle-parent in sal-remoterpc-connector

Change-Id: I42310b193ae3bd5726ad679771b38964d267cbf4
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Switch from config-parent to binding-parent in clustering-it-provider

Change-Id: Ic1a9d83f155ba02687ded8fd3b4ee13aa1ab2201
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Remove protocol-framework config yang

The protocol-framework config yang modules aren't used, ie aren't
instantiated in any config XML file, so remove them.

Change-Id: I8cc8c8c8666ef731a2e7da20a3046300ce5aad45
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Migrate sal-common-testutil odlparent-1.8.0-Carbon

Missed one in our migration work.

Change-Id: I6cfc1669277601887bacdb756ec8f996f32c1b92
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Remove unneeded cast

Once mdsal.dom.broker.ShardedDOMDataTree is fixed up, we can remove
this cast.

Change-Id: I042dbf3bf071b7af33876501e3274fc96724e58c
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Fix format string mismatch

String expects two objects, not one.

Change-Id: I5cc37336236e88c13d569c656910d7fd969bb655
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Remove config yang/module from sal-cluster-admin-impl

The module was a no-op but was kept for backwards compatibility for
the uber controller.currentconfig.xml but, since netconf was converted to
use the DS a couple releases ago, it should be ok to remove it.

Change-Id: Ib631658bcf5abc77231e8195112964208d3465a6
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Convert rpcbenchmark to blueprint

Change-Id: I1af664396721a899d51d12fe132853d642d98a7e
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Convert ntfbenchmark to blueprint

Change-Id: I6e4fe58575783d94cf9d01dbcd5b45d50d8b0f60
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Convert dsbenchmark to blueprint

Change-Id: I14516ea469ec23aff17bc3d565f859b9edb99d2d
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8494: do not attempt to reconnect ReconnectingClientConnection

If we are in reconnect state, we should never attempt to initiate
reconnection, as that would leave us without the timer running --
which is a problem since we need to be timing out requests which
are queued even as we are attempting to reconnect to the backend.

Change-Id: Ic955a2e5b743617c26cc72815df94d0c4584704c
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 584be7bf6b41f2f8b01dd718aba8d3b6cf7426ef)

BUG-8403: fix DONE state propagation

The DONE state detection logic in replayMessages() was flawed, as
we checked the current state, which is guaranteed to be SuccessorState.

We should be checking the previous state, available from the successor
state. As it turns out we can do this very cleanly by setting the flag
when the successor state gets the previous state assigned.

This also has better performance, as we do not touch the volatile
field multiple times.

Change-Id: Ica2246160bf8fee7aa134bbacb45857235405f6a
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit b135d9ab189dfd7443f895aa96160e22666cb266)

Migrate to odlparent 1.8.0-Carbon

Per request of odlparent project we are downgrading all Nitrogen
projects to use the released odlparent 1.8.0-Carbon to allow for the
odlparent project to start performing semver style releases.

Jira: RELENG-159
RT: 41406
Change-Id: I69e9fab9531b1127286ca3f4467b9a046ce25a51
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Remove deprecated ShardDataTree#commit method

Change-Id: Id9eaa309f1270e555202e7afafa9ce69c63405da
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Remove deprecated NormalizedNodeInputStreamReader ctor

The intent is for NormalizedNodeInputStreamReader to be package-scoped
and to create instances via NormalizedNodeInputOutput#newDataInput.
Thus the deprecated public constructor was removed and the remaining
users were converted to use NormalizedNodeInputOutput#newDataInput.
However newDataInput has the side-effect of validating the input stream
first which failed for a couple users who need to lazily do the
validation so I added a newDataInputWithoutValidation method.

Change-Id: Ieb97ab77d05d7a4401dd0526cd4df3a5eafc9eda
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

sal-common-testutil TestEntityOwnershipService

moved from existing (and already used there)
org.opendaylight.genius.interfacemanager.test.infra, because I now need
to also use this in a new component test for netvirt fibmanager, so
instead of copy/pasting it, let's have it here.

Change-Id: I4a524594f9d4bf0da6015738ef3b454eba13159d
Signed-off-by: Michael Vorburger <vorburger@redhat.com>

Fix RecoveryIntegrationSingleNodeTest failure

The InMemoryJournal may not have received all the persisted messages
by the time it checks the expected size of the journal so added a latch/wait
for he expected messages..

Change-Id: I8f050b9375f5e3e74749c17e831add21d09d1831
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

Bug 5740: Configure control-aware mailbox

Configured unit tests and production to use a control-aware mailbox
for Shard actors. The current code allows for a "shard-dispatcher"
to be defined so I added a section in the .conf that specifies the
mailbox-type appropriately (ie UnboundedDequeBasedControlAwareMailbox).

Change-Id: Ibdb404e1dfcc699471a8e899c491a09500ee04c0
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8494: fix throttling during reconnect

ReconnectForwarder is called from differing code-paths: the one is
during replay when we are dealing with late requests (those which have
been waiting while we replaying), the other is subsequent user requests.

The first one should not be waiting on the queue, as the requests have
already entered it, hence have payed the cost of entry. The latter needs
to pay for entering the queue, as otherwise we do not exert backpressure.

This patch differentiates the two code paths, so they behave as they
should. Also add more debug information in timer paths.

Change-Id: I609be2332b13868ef1b9511399e2827d7f3d5b7d
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 851fb56fba015c9fee3f0f9235c5c631a492ce59)

BUG-8403: propagate DONE state to successor

We need correct accounting for DONE non-standalone local transactions,
as such transactions do not interact with open/closed semantics.

Propagate DONE via a simple flag, which we check in local ProxyHistory
a create a proxy without a backing modification.

Change-Id: Ie921db8c9e40f30934c119b74c31ca5418b61548
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 59ffaa4e95ffc8f8f04d1ca8d4f45f2ac1ef23b6)

BUG 8318: Add section for remoting transport-failure-detector

Similar to separate dispatcher for cluster we might also
trip a false positive in remoting so add this in so we can modify
the parameter in csit.

Change-Id: I751fec044e2bf0f0d82badb2ea7d581b3374ac4a
Signed-off-by: Tomas Cere <tcere@cisco.com>
(cherry picked from commit 7ea291d0c0d183755795e33881fd040c368f57a7)

BUG-8403: go through the DONE transition

Third step of the fix: make make AbstractProxyTransaction go through
the DONE state before retiring. This ends up also fixing breakage
in local chain transactions, which could end up leaking because we
never go back to just using the base data tree.

Change-Id: I97ac1687eaf3ecd8f46a68c6170891ea06703e95
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 9b4c07ca27b2c3c7ca5d5e790aa1f121ce4857f3)

BUG-8403: add state documentation and DONE state

Second step of the fix: clarify AbstractProxyTransaction states and
their transitions. Introduce a DONE state which we will use to close
the replay state race window.

Change-Id: I82e47103f2cd9b8ec496b72803b5d5e56d33c0f5
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 720292646a403e3fa4f33d12dcecbd059486b871)

BUG-8403: move successor allocation to AbstractProxyTransaction

We still have a tiny race window where we do not correctly handle
reconnection, leading to an ISE splat -- this time if the final
stage of transaction completion.

The problem is that a reconnect attempt is happening while we are
waiting for the transaction purge to complete. At this point the
transaction has been completely resolved and the only remaining
task is to inform the backend that we have received all of the
state and hence it can throw it out (we will remove our state once
purge completes).

We are still allocating a live transaction in the local history
and the purge request replay does not logically close it, leading
to the splat.

To fix this, we really need to allocate a non-open tx successor,
which will not trip the successor local history. All the required
knowledge already resides in AbstractProxyHistory, except we do
not have a dedicated 'purging' state.

This makes the first step of moving the allocation caller to the
appropriate place.

Change-Id: If82957019b478f4d5132edda4d38e6bc026aa0ab
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit cc5009b8f3ea91f64ee48cda815c6a5e73a8a1af)

BUG-8494: Cap queue sleep time

Inconsistency in transmit queue backpressure has caused an observed
delay of over 44 minutes in CSIT. Thile this is an artefact of a bug,
we should not delay application thread for large periods of time, as
that can cause failures in the application (if it has other
time-bound duties like BGP keepalive messages and similar).

Restrict the sleep time to 5 seconds, emitting an INFO-level messages
when we reach this level of sleeping. This should not typically not
occur, as the backpressure should be kicking in much sooner, smooting
out the delay curve.

Change-Id: Ie5f148248caa71791bdda71ddd7e33e5733aa7f8
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 15a67bd103c2dc32f28139a7295ac84143c20d0c)

Bug 8446 - Increase timeout in leadership transfer

Change-Id: Iffd66ef2c771b797b236f23c39b1fb87b5a27c89
Signed-off-by: Jakub Morvay <jmorvay@cisco.com>
(cherry picked from commit ea6ba66600c4f2f143cbb13279ded184ce6d3fcc)

Cleanup time access

ShardDataTree does not need to expose the ticker, just a readTime()
method. This makes the users slightly more readable.

Change-Id: I9aa72a2d3625f40a2a44b0838ff344437293e1e3
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit a0f85a19ba36f71288c7b45575befd98d7d77ed4)

BUG-8515: make sure we retry connection on NotLeaderException

There is a race window when we are establishing connection to the
backend:

When we received the pointer to shard leader, we send a connect
request, but during that time window the leader may move, resulting
in a NotLeaderException response to ConnectClientRequest. Since
we are in reconnection mode, this will result in hard abort of
connection.

Fix this by wrapping NotLeaderException and akka failures in a
TimeoutException -- hence we will retry connecting.

Change-Id: Ia5d1915d59e80a70c54302c1790121d0767ff08a
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 51a85b6c8fce1d9808285a6ad81dc7068afbf7c7)

BUG-8403: do not throttle purge requests

It seems we are getting stuck after replay on purge requests,
which are dispatched internally.

Make sure we do not use sendRequest() in obvious replay places,
nor for purge requests. Also add a debug upcall if we happen to
sleep for more than 100msec.

Change-Id: Iec667f2039610f3f036e6b88c7c7e7b773cdfc19
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 20ece8c549211d1c453f1763132bb0a0ca7be0e0)

BUG-8538: rework transaction abort paths

Direct transaction abort path can end up touching proxy history's
maps, which it should not, as that happens only after purge. This
inconsistency has cropped up when purge was introduced.

Refactor the methods so that cohorts are removed only after purge,
and fix abort request routing such that it always enqueues a purge
request (possibly via successor). This also addresses a FIXME, as
we now have an enqueueAbort() request, which is not waiting on the
queue.

Change-Id: Ie291da70ace772274f33505db376a915b38e37c0
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 8fca604f2312ef365ce05343c2378cf36f2e31af)

BUG-8538: do not invoke read callbacks during replay.

As evidenced by a ConcurrentModificationException happening reliably
in face of aborted read-only transactions, there are avenues how
our state can be modified eventhough we hold the locks.

One such avenue is listeners hanging on read operations, which
can enqueue further requests in the context of calling thread. That
thread must not be performing replay, hence delay request completion
into a separate actor message by using executeInActor().

Change-Id: Ibcd0ac788156011ec3a4cc573dc7fb249ebf93a2
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 8123d0fc56a28324fed48e3027edf090e8149b9b)

BUG-8371: Respond to CreateLocalHistoryRequest after replication

CreateLocalHistoryRequest needs to be replicated to followers before
we respond to the frontend, as logically this request has to be
persisted before any subsequent transactions.

While the frontend could replay the request on reconnect, it would
also have to track the implied persistence (via child transactions),
which we do not want because it really is a backend detail and it
would lead to a lot of complexity in the frontend.

Change-Id: Icdfad59d3c2bab3d4125186c6a9b3c901d3934f6
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 8f18717f60e58eebf726fe0611859311fa83df44)

BUG-8540: suppress ConnectingClientConnection backend timeout

While a ClientConnection is in initial connect state we do not want
the timer to attempt to reconnect it, as it we are already trying
hard to connect it. Suppress that attempt by faking backend silent
ticks to be 0.

Change-Id: Iaf554632a56fd5be1d417d6806462edf3c746526
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit a3b2c1a05f66523561a10ac898723ffdf7e68798)