git.opendaylight Code Review - controller.git/log

Bug 9165: Log config subsystem readiness as INFO

Change-Id: I487760e19ac317f7246ac9b9b47f2a65df100e6b
Signed-off-by: Vratko Polak <vrpolak@cisco.com>

Bump versions by x.y.(z+1)

Change-Id: I2c5e192d649567f8d83d4b92409aa588ecb77285
Signed-off-by: jenkins-releng <jenkins-releng@opendaylight.org>

Bug 8038: Ignore testLeadershipTransferOnShutdown

Intermittently fails in Boron so set to Ignore. The test was fixed
at some point after Boron.

Change-Id: Ia77f95e9d46e2d68b3bfd8c28e10f540c260e72d
Signed-off-by: Tom Pantelis <tompantelis@gmail.com>

BUG-8327: GlobalBundleScanningSchemaServiceImpl should be a proxy

We are currently running to separate services which assemble
the GlobalSchemaContext, which hurts our startup performance and
leads to wasted memory. This is an artefact of the mdsal split,
hence we should be getting the service from the MD-SAL and
just proxy to old interfaces.

This lowers the startup time for

feature:install odl-restconf odl-bgpcep-bgp
odl-bgpcep-data-change-counter odl-netconf-topology

from 86s down to 67s (22%). Final retained heap size is also
lowered from 217MiB to 181MiB (16%)

Change-Id: I549e9512538bd83d86cfd2164d03e34bc9130c1e
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 3127a42ac3a743d58f5d529ef82b779682fa7566)

BUG-7927: stop scanning bundles on framework stop

Monitor framework bundle for STOPPING event and when it triggers
flag us as stopping: all bundles are about to shut down, so there
is no point in trying to update the schema context anymore.

Change-Id: I1a55169fce1705c19a139063cf632674fc256701
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 6a79e55d2b6462cd609ab8cd5766fd4222c18c4f)

Turn off visibility of GlobalBundleScanningSchemaServiceImpl#start()

Since the start() method is only used in the createInstance(), it
should be private and not exposed.

Change-Id: I0264d0a66bbfb2536bc4d6c57f27f15584ddfabb
Signed-off-by: Alexis de Talhouët <adetalhouet@inocybe.com>
(cherry picked from commit 6ed4207635b1ac2f4bb9611e82130002602f0d4d)

Remove artifacts entries for long-gone RESTCONF

RESTCONF has been moved to its own project, hence these
artifacts entries are duds. Remove them.

Change-Id: I72d918567a04841784b0a8061ec655fe79af6ae4
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

Move sal-remote to sal-rest-connector

This module is only used by sal-rest-connnector bundle,
moreover, it is intended to create notification stream
and/or register a data change event which is done using
RESTCONF, thus it make sense to move it to the appropriate
project.
I beleive this is a leftover of the migration that happened
when controller was splitted out..

This is the associated patch in Netconf project:
https://git.opendaylight.org/gerrit/#/c/44575/

Change-Id: Ied4bb2b4b04d36298ca14f9f7926e4aa52de7be2
Signed-off-by: Alexis de Talhouët <adetalhouet@inocybe.com>
(cherry picked from commit 65482327ec4e76ebce71b572ad69a648fe3ef014)

BUG-8219: optimize empty CompositeDataTreeCohort case

The common case is when we do not have any user cohorts, in which
case these is nothing we need to do. Address the FIXME by adding
shortcuts which transition state directly without burdening the
global execution context.

Change-Id: I38e163a879949c3755322ed371db3bff5d28142f
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>

BUG-7783: increase precision of execution times

Document the time units we are using for measuring execution
and make sure they can hold any long.

Change-Id: I859349e27604c75d426ad7c4eec9d6870b081291
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 20cffa6b2251167e428a641d18a49958044fe598)

Bug 7814: Add counter to make tx actor names unique

Appended an incrementing counter value to the actor name which
will guarantee uniqueness.

Change-Id: I5454968ebd549c24293203f112d4541d2aede671
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bumping versions by x.y.(z+1) for next dev cycle

Change-Id: I2fdf8b6e9d9bc0e416e52c665d1aa4fea575efb3
Signed-off-by: Anil Belur <abelur@linuxfoundation.org>

BUG-5222: remove xsql from archetype

XSQL should not be here, kill it.

Change-Id: I68bafa8961598f3407763661c1c3a294c6209774
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit cfc5170cf5d75c5c89deeecd726cbf7fa36f660e)

Bug 7814: Fix InvalidActorNameException

When a read tx actor is created in the Shard, the actor name is generated
as eg "shard-member-1:datastore-operational@0:19" where 0 is the instance's
static generation id and 19 is the tx id. If the tx was created from a chain,
the chain's history id is appended. So every part of the name is constant for
the controller instance except the tx id, which is generated via a counter in
AbstractTransactionContextFactory, and the history id which is also generated
via a counter. The counters do make the full tx id unique in the cluster.
However if multiple shards are involved in a single front-end transaction,
they all have the same tx id and thus the actor names generated by each shard
would be the same. It seems that's what occurred in Bug 7814. Therefore I
changed the actor name to include the shard name.

Change-Id: Iacb11bb401bd6bded38847690f8009c115ee0637
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix timing issue in testChangeToVotingWithNoLeader

RaftActorServerConfigurationSupportTest#testChangeToVotingWithNoLeader
failed on jenkins:

  RaftActorServerConfigurationSupportTest.testChangeToVotingWithNoLeader:1213 getStatus expected:<OK> but was:<NO_LEADER>

In the following test code:

  MessageCollectorActor.clearMessages(node1Collector);

  long term = ...
  node1RaftActorRef.tell(new AppendEntries(...). ActorRef.noSender());

  // Wait for the ElectionTimeout to clear the leaderId...

  MessageCollectorActor.expectFirstMatching(node1Collector, ElectionTimeout.class)

It expects an ElectionTimeout message to occur after the AppendEntries tell
but it's possible for an ElectionTimeout message to occur in between
clearMessages and tell calls which leads to the subsequent NO_LEADER b/c the
leaderId wasn't cleared yet via a subsequent ElectionTimeout message.

The test expects the leaderId to be cleared after the AppendEntries tell so
I changed it to explicitly check for that. The fact that it's actually cleared
as a side effect of an ElectionTimeout message is an implementation detail
anyway.

Change-Id: I66eaad090d0e75fc3731e59f0a345cb04b4f2c4c
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 88852df542314ff2cb6f3669f4a2e1018e664769)

Bug 6856: Rpc definition should implicitly define input/output

The yangtools patch for this bug required a minor change in mdsal's
binding generator v1 to maintain compatibility. The mdsal patch has
already been merged. The binding generator v1 now generates classes
for input and output statement only if they are explicitly declared
in the model. As a consequence, a minor adjustement has to be made
in BindingToNormalizedNodeCodec's findRpcMethod().

This patch needs to be merged to unblock the distribution-check job
in the yangtools patch.

Change-Id: I6ed0e57aae35987ae943c8aa7a4a79861af04f7a
Signed-off-by: Igor Foltin <ifoltin@cisco.com>
(cherry picked from commit e5c67ba252d4e3f5a5c546ba523fefe880afc274)

Bug 7746: Fix intermittent EOS test failure and synchronization

Modified the EntityOwnershipListenerSupport class to be thread-safe
utilizing a ReadWriteLock to guard access to the listenerActorMap
and entityTypeListenerMap. While updates are only done by the
EntityOwnershipShard and thus aren't concurrent, read access occurs
concurrently via the EntityOwnershipChangeListener which runs in its
own actor sandbox. I also factored out an
EntityOwnershipChangePublisher interface with the read-only access
methods used by EntityOwnershipChangeListener to make it clear
that EntityOwnershipShard is the only mutator.

The testFunctionalityWithThreeNodes case failed intermittently b/c
it's possible for the follower2MockListener to get notified of a
pre-existing entity on registration twice if the prior owner change
is replicated and the EntityOwnershipChangeListener triggers concurrently
with the registration and the timing is just right. I added verification
of ownership sync to follower2 pior to registration to make it
deterministic and avoid the sporadic timing failures.

Change-Id: Icc197d0c23135ca69b56eac1702f249e8e60e66e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix intermittent failure in testCloseCandidateRegistrationInQuickSuccession

java.lang.IllegalStateException: Optional.get() cannot be called on an absent value
  at com.google.common.base.Absent.get(Absent.java:47)
  at org.opendaylight.controller.cluster.datastore.entityownership.DistributedEntityOwnershipIntegrationTest.testCloseCandidateRegistrationInQuickSuccession(DistributedEntityOwnershipIntegrationTest.java:512)

Code:

if (!leaderEntityOwnershipService.getOwnershipState(ENTITY1).isPresent()
  || leaderEntityOwnershipService.getOwnershipState(ENTITY1).get() ==
                                               EntityOwnershipState.NO_OWNER

The code inlines calls to getOwnershipState so it's possible the first call
returns a present Optional and the second call returns absent which leads to
the failure. It's safer to capture the Optional in a lcoal var.

Change-Id: I9baa120efc9924dc820435dd63217b4598731a13
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Usage of Collections.unmodifiableCollection is unsafe

This is a follow-up to https://git.opendaylight.org/gerrit/#/c/51583/
wrt a review comment questioning why I returned a new ArrayList instead
of returning the Set directly. One reason was to avoid mutation of the
internal set by the caller but also to capture the state of the set at
that point in time and avoid concurrent mods which may not be safe.
Where a concurrent set is used, it would be thread-safe to return the
set directly but the set may by modified as the caller is iterating it
which may not be desired. For the other class whose set is a keySet
from a non-threadsafe Map, the caller could get a ConcurrentMod
exception while iterating. Based on the review, I changed the call
sites to return a Collections.unmodifiableCollection but this is also
incorrect and is susceptible to the same issues as that impl reads-thru
to the underlying collection. Therefore I changed the call sites back
to returning a new ArrayList.

Change-Id: I504f38c5bfc4c707180ac301eb10acd0ac24f872
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit d98bea5d456be31205b64dc8f0c5f3ae2b1a4cd0)

Add OnDemandShardState to report additional Shard state

Extended the OnDemandRaftState with OnDemandShardState to include
additional Shard state, including DTCL, DCL, and commit cohort actors.
This will enable us to report thus info from the JMX bean as it's useful
for debugging to have visibility into what listeners and cohorts
are registered.

The actors now also store the registered path. Both the instance and path
will be queried for debugging.

Change-Id: Iaa6c27c9aba3b5c0223199e6a3fc21bc54da95ba
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add DOMDataTreeCommitCohort example for the cars model

Change-Id: If15c748ceb718d9902ee6c0d5d5a7337a4cbd211
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add more info logging in sal-akka-raft

Added more info logging for abnormal and infrequent code paths in
the raft behaviors to help in troubleshooting as these paths are
sometimes involved when something goes wrong.

Change-Id: I3017c81c2ef7100ca8a9477285ca637355c05e87
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

CDS: updateMinReplicaCount on RemoveServer

When a replica is removed, we need to call updateMinReplicaCount
so the removed replica is no longer counted for consensus. Updated
the testRemoveServer case to cover it.

Change-Id: Id4b71381f7c32155c641e461832cba680c7062a6
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-7608: activate action-service element

With downstream users fixed up, we can activate the action-service
element to actually require a promise of instantiation of actions.

Change-Id: I3f87acfd713936a4877822b2f62b5a7d2be46107
Signed-off-by: Robert Varga <rovarga@cisco.com>

BUG-7573: add BucketStore source monitoring

Add BucketData interface capture, which exposes an optional ActorRef.
If this reference is given for a Bucket's data, the bucket will be tied
to the source actor's lifecycle via DeathWatch.

If such an actor is not provided, only basic cluster-level monitoring
will be done.

Change-Id: I794bbf9b360d0c3bf68b29e6869a4f5c7c0d2470
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 00634259fd13ebc57f16ad63340e6472a2b6c6f2)

BUG-3128: cache ActorSelections

This is a performance optimization. Since the ActorSelection
for a remote node is an invariant, keep a handy cache of
these objects so we do not have to construct them on every
GossipTick.

Change-Id: I820c1d9be5c198a6cac7932b0de0e0776b35b0a5
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 8426e7a67b1235e8ecc67b1a98a5bd096c88e729)

BUG-3128: rework sal-remoterpc-connector

This patch reworks the implementation to take advantage
of the services provided by DOMRpcService, notifying us
of locally-available services.

Previously we have registered all routed RPCs known in
the SchemaContext for global routing context, which has
causes lookups for routed RPCs not otherwise bound to
call back to the remote connector, which then performed
a router lookup.

This approach is slow as each RPC invocation incurs
an additional round-trip to the RpcRegistry to lookup
the appropriate router before the request is sent to it.
It also does not work for global RPCs, because they
only ever have a global context -- hence the routing
decision needs to be made solely in DOMRpcService.

With this patch we maintain a single higher-cost
implementation registered towards each remote node,
handling RPCs discovered via gossiping RoutingTables.
The implementation dispatches requests directly to that
node's RpcInvoker (formerly known as RpcBroker). That
way DOMRpcService will perform internal routing based
on cost and invoke our service only if there are no
local implementations registered.

RpcRegistry no longer performs delayed router discovery
and instead dispatches RoutingTable bucket updates to
a new actor, RpcRegistrator, whose sole job is to maintain
registrations of RemoteRpcImplementation instances for
each remote node with DOMRpcProviderService.

Because of DOMRpcService's ability to filter registration
notifications, our RpcListener will never be notified
of our registrations, which precludes routing loops. We
can therefore remote RemoteRpcInput, whose sole purpose
was to act as a loop detector.

Futher cleanup is done to RpcManager and RemoteRpcProvider
lifecycle, as these now correctly terminate their children
and remove registrations on both restarts and shutdowns.
RpcManager's children startup is also moved from the
constructor to preStart(), so as correctly plug into
the actor lifecycle.

Gossiper is updated to forward node removals to the BucketStore,
so that buckets from unreachable nodes are removed as soon
as possible.

BucketStore is updated to pass down changed buckets
to the subclass whenever a bucket is removed or updated.
It also requires the subclass to provide the initial bucket
data item -- which makes it obvious that a bucket's data
can never be null. This was previously achieved by sending
updating the bucket data from the subclass constructor.

BucketStore/Gossiper messages are updated to be immutable,
which simplifies their instantiation and ensures that they
do not contain nulls (which is required anyway).

Change-Id: I4efb3ddd8ea46ae5be1eb59f1d4fe508f2bc5763
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 2418a6052d7eba917d5972f0630cf746d22f690c)

BUG-7608: Clarify DOMRpc routing/invocation/listener interactions

As it happens our DOMRpcProviderService is under-documented around
routing behavior and how it interacts with respect to both
DOMRpcService and DOMRpcAvailabilityListener.

Fix this by defining the interactions the same way they are implemented
in the only implementation, DOMRpcRouter.

The fallout of these clarifications is that blueprint's interpretation
of the API contract covers only the RFC6020 RPC part correctly and falls
short of the RFC7950 action case.

This shortcoming will be addressed in a follow-up patch.

Change-Id: I2572c21b7aa6f24b9e2ed37f446b76a032f1880b
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit aa342f77a044988c1f6a0deaf9f7e94373f2dfb5)

BUG-7697: add defences against nulls

Null listener is invalid, also make sure we do not ever set
prevRpcs to null.

Change-Id: If8cd16e93a2a07c77a26569c8ecacdc35696cea1
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit a3737302942580f13ca9988647873b83985895ed)

BUG-6937: Add ReachableMember case to Gossiper

In case we detect a member down event we remove that member's
address from the whitelist, leading to further gossips being
ignored.

Subscribe to ReachableMember event to receive notifications
when the cluster heals, so we propagate re-add the member
back to the whitelist.

Change-Id: Id6b366edfa2be89e1a15225d2cad786bbf552129
Signed-off-by: Tomas Cere <tcere@cisco.com>
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit b78ee4d6b08e2cc0cf5edd01af0e54c3bf619ab5)

BUG-3128: do not open-code routed RPC identification

Identification of routed RPCs is available via RpcRoutingStrategy.
Use that code to identify routed RPC instead of duplicating same
logic in multiple places.

Change-Id: I5a12b8fd891cb41f805b2a4e7ae465d4004aca39
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit c5a1d2431bc9c13b7537aa75c264035951ac0de0)

Remove DOMRpcIdentifier.GLOBAL_CONTEXT

This is a shorthand for YangInstanceIdentifier.EMPTY, hence
we can inline the definition.

Change-Id: Icb8f025feb48cbc6add7c30d1db863b19c18f546
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 3499bc0e37043073992c437daed87f10ba0a3e82)

BUG-7594: Expand NormalizedNodeData{Input,Output} to handle SchemaPath

These utility classes are already dealing with QNames, so it makes
sense to expand their capabilities to include SchemaPath serialization.

Change-Id: Ibcb931f78959eb57f834cd2892511c4963638caa
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 244ee83be8c1180ea1845b8768503c8013b0dc7f)

BUG-6937: correct format string

This is a fix, the constructor takes String.format() string,
not a Logger.debug() one.

Change-Id: I966b02cd0c280b50ec1d0e77fb5a493fa2f5a4fe
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit c0ddac051a1eec4ac2b12191ce61b6fcec265772)

Cleanup RemoteDOMRpcFuture

Add a missing space and make mapException() static.

Change-Id: I4e9c33899bff7e0488dbe8537f4e832e50a3c53e
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit aead44997ca7c9bf31be83ebcdd6b01aed23b8f3)

BUG-7608: Add ActionServiceMetadata and ActionProviderBean

This patch add the new concepts of action-provider and
action-service.

The implementation does nothing, as we are transitioning from
a run-time logic being coupled with sal-remoterpc-connector.

This allows us to migrate users, while retaining behavior indepent
of sal-remoterpc-connector's actions. This will allow us to fix
BUG-3128.

Once it is fixed, and DOMRpcRouter can express the action-provider
advertisement, we are going to actiovate the commented-out code
ActionServiceMetadata.acceptableStrategy().

Change-Id: I3f412d092c10b51a198721f288fdefdfc907f0b7
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 88330d2f3ff048ab4e2e6f348ec3ea56e4c02cd4)

BUG-7506: use common DocumentBuilderFactory

Yangtools exports UntrustedXML utility class, which contains pre-configured
document builder factories for dealing with XMLs which are not completely
trusted. Reuse that instead of rolling our own, especially in the XML parse
path.

Change-Id: I83d0ea60104f2266669493548e2d40b5ab6e4772
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 1071bff328641dadd4e44d8c0571a069f8747da0)

BUG-7608: OpendaylightNamespaceHandler methods can be static

Eclipse emits warnings about methods being potentially static,
clean that up.

Change-Id: I4b5fb6d12486ea20d4a47429eafebe3f8b559c40
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 26ab6692366c5fd42669bc1bff80b4ec7a394f0d)

BUG-7608: restructure exception throws

This cleans up the try-catch block so that we do not have
to re-throw exceptions.

Change-Id: I8c876f12b7a9e9108eb8dcca7a602927a78bec2c
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit d62041440e4f29ee119e5deeb2c1c2972adf5007)

Bug 7326: Fix ConcurrentModificationException in Blueprint

in AbstractDependentComponentFactoryMetadata.stopServiceRecipes()

tpantelis: "This is an edge case where the container is destroyed
immediately after and while it's starting up. This isn't likely to
happen in production but can happen during feature tests. I had assumed
the container would provide the protection but apparently it doesn't."

Change-Id: Id7532d30cb0a5f67fd907cb15372069d8769b247
Signed-off-by: Michael Vorburger <vorburger@redhat.com>
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 2875772469b2f8ed80f2dd6b539a8482d1f2e0b0)

Fix FindBugs warnings in blueprint and enable enforcement

Warnings fixed:
- OpendaylightNamespaceHandler(line 83): "Usage of GetResource may be unsafe
if class is extended". Made the class final so it can't be extended.

- BlueprintContainerRestartServiceImpl(line 140): "return value of this method
should be checked". Log warning if 'await' returns false.

Change-Id: I1473acabd0a4126f5e5d2745292fcbff9a308462
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit ab1fee874f408a740df16f55190610e1013446ac)

Checkstyle compliant src/main|test/resources

Change-Id: Ic7dc38ddedb3ed642eb8581cc223269c1bf36408
Signed-off-by: Michael Vorburger <vorburger@redhat.com>
(cherry picked from commit c7846405c83f680660852f299d8051b420b3cddd)

Fix CS warnings in blueprint and enable enforcement

Fixed checkstyle warnings and enabled enforcement. Most of the
warnings/changes were for:
- white space before if/for/while/catch
- variable name too short
- line too long
- illegal catching of Exception

Change-Id: I2a9eb1dc47f46a2c56dc2415ee9ebb73ec7d18c4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 67ff0fc78b2933b8b4f5a8544c7639499824e622)

BUG-3128: Update RPC router concepts

Enrich DOMRpcImplementation with invocation cost, which is taken into account
when deciding which implementation to invoke. This allows local RPCs to be
prioritized over remote ones.

Also add the ability to filter implementations when notifying availability
listener. This allows remote RPC to filter its own registrations, preventing
re-forwarding loops, where a remote implementation would be forwarded as
a local one.

Change-Id: Id1d78d5031904e19134c103e12b79d68cf0b98c3
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit dc76c5f86830b541fe9c4f2a011e199486558779)

Update dependendency desc properly in RpcServiceMetadata

When it transitions to waiting for available DOM RPCs, the
dependendency desc is now updated appropriately to reflect
the correct context.

Change-Id: Iaf7108dd664c9ed78444b0f3dfa4e14be431f35f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 47788b7b68045c46994bd3f8a0aecc0df27037ed)

BUG-5222: offload XSQLBluePrint creation to first access

Constructing XSQLBluePrint in onGlobalContextUpdated() slows
down startup and is utterly inefficient (like all of XSQL).

As a stop-gap measure move its instantiation to first use,
when it is constructed from saved SchemaContext reference.

Also remove uneeded elements field, as it is not used anywhere
and just gets in the way.

Change-Id: I954d2217da6ec8b12d0b980d864cf3d776df78cc
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit d0dc66335889ecec5dbc962a8604c3df96eca758)

Bug 7469: Advertise CDS DOMDataTreeCommitCohortRegistry

Advertised the CDS DOMDataTreeCommitCohortRegistry implementation
via the DOMDataBrokerExtension mechanism.

I needed to add a DOMDataTreeCommitCohortRegistry interface version
in the controller DOM API package that extends the controller
DOMDataBrokerExtension since the mdsal DOMDataTreeCommitCohortRegistry
version extends the mdsal DOMDataBrokerExtension.

Change-Id: I71daac1cd7c231d071c376206d85786c333bac68
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 7391: Fix out-of-order LeaderStateChange events

On leader transition, the current leader first sends out
LeaderTransitioning events to each follower to tell them the
current leader is being deposed. The followers send out a
LeaderStateChange event with a null leaderId which is picked
up by the ShardManager to delay subsequent transactions activity
to the shard until a new leader is elected. However it's possible
the LeaderStateChange message does not reach a follower until after
the leader transition occurs (eg due to dispatching delay in the
caller or the network). This results in a LeaderStateChange event
with a null leaderId being delivered after the LeaderStateChange
with the new leaderId. I wrote a unit test that reproduces it.

We need to handle LeaderTransitioning events in a CAS-like manner,
ie include the leaderId with the LeaderTransitioning message and
only issue the LeaderStateChange event with a null leaderId if the
current leaderId matches the leaderId in the LeaderTransitioning
message.

Change-Id: I24e8bbf7707858ac4ed62f3a979cc0403daff8ac
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bumping versions by 0.0.1 for next dev cycle

Change-Id: I98af79728b1e25dd84c2b14a1b8563ae53dbbde1
Signed-off-by: Anil Belur <abelur@linuxfoundation.org>

Bug 6003: Create config-filtering-parent

The Bug requests an addition of resource filtering functionality to config-parent.
There are several issues with replacing config-parent functionality:
- It is a top-level non-karaf feature, subject to API freeze.
- maven-resource-plugin operates on directories (as opposed to files).
- Users (ODL or otherwise) may change config.file value.
+ Groovy script can be used to extract directory name...
- but it is not easy propagate computed values to sibling plugins.
  http://stackoverflow.com/questions/21867095/setting-properties-in-maven-with-gmaven
- Using two profiles in config-parent does not work easily
  http://stackoverflow.com/questions/1504850/maven-activate-child-profile-based-on-property
  (unless creation of temporary files in local filesystem is abused).

Therefore, a descendant parent pom is defined in this Change.
Presence of ${config.file} activates profiles in both config-parent and config-filtering-parent.

Small edits to config-plugin:
+ Add profile ID to distinguish filtering and non-filtering profiles.

Change-Id: Ia9ef770be232e04e4ba73f634a51b50665ee18ab
Signed-off-by: Vratko Polak <vrpolak@cisco.com>

Bumping versions by 0.0.1 for next dev cycle

Change-Id: I77a17e90be85066c439bbd386582519d7ddd0a3f
Signed-off-by: Anil Belur <abelur@linuxfoundation.org>

Do not wrap Guava as a bundle in features' definition

As it will refresh the bundle when the feature is loaded,
because it will certainly be already installed, and hence
will refresh all bundles depending on Guava.

Also, this is not necessary.

This patch is not directly tied to BUG-6956 but it's a result of it

Change-Id: I79a3adac4dd8d21757f8c7756b0239413ee55589
Signed-off-by: Alexis de Talhouët <adetalhouet@inocybe.com>
(cherry picked from commit 5679203b147817962534344db273e4f2109fd949)

Configurable update-strategy for clusteredAppConfig

Any change to application's config data results in restart of the
blueprint container. This change adds an attribute that allows different
applications to disable this restart.

Attribute added: update-strategy
Values: reload, none.
Default: reload

Change-Id: Ie0c7501f8b5c84970a46ca8f02d7f77caf913a0a
Signed-off-by: Vishal Thapar <vishal.thapar@ericsson.com>
(cherry picked from commit cab1d5845cb951fe31a3243653ed567583dc73c1)

Bug 5700 - Backwards compatibility of sharding api's with old api's

Implementation of controller DOMDataBroker interface which delegates
calls to shard aware implementation of md-sal DOMDataBroker

Change-Id: I5694da6d660453ed6a0382006df808cc321d4130
Signed-off-by: Jakub Morvay <jmorvay@cisco.com>
(cherry picked from commit cf2cc1b770f6d1b5fc04e5b8e4081f306853b909)

Bug 6910: Fix anyxml node streaming

On output, changed AbstractNormalizedNodeDataOutput to transform the
DOMSource to a result String that is serialized to the stream. On input,
modified NormalizedNodeInputStreamReader to parse the XML string into
a org.w3c.dom.Node and create a DOMSource.

Change-Id: Ib10822c4444331351cf7f25e1f26d981f7d41dc7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 2cf4749c41aa32c6b77064fc1ae0e231adc4a5f4)

Bug 6540: EOS - handle edge case with pruning pending owner change commits

The previous patch https://git.opendaylight.org/gerrit/#/c/45516/ added
pruning of pending owner change commits on leader change. However there's
one edge case which wouldn't work correctly where the leader successfully
commits a transaction to add a candidate but becomes isolated when it tries
to commit the transaction to set the candidate as the owner, assuming the new
candidate is the only candidate. When the partition is healed, the owner write
transaction will be pruned and dropped thus no onwer will be selected.

We could allow this owner write to be forwarded to the new leader since it
originated from a client candidate add request. However this could still be
problematic if, during isolation, the majority partition gets a candidate add
and commits an owner. After the partition heals the "old" owner write would be
forwarded and overwrite the previous owner. This wouldn't be catastrophic but
would incur an unnecessary owner change. I would rather keep consistent behavior
of dropping pending owner writes to a new leader.

Instead, the new leader can assign the previous leader as owner when the partition
heals. So in onPeerUp and onLeaderChange, I added code to search for all entities
with no owner and select and write an owner. Therefore when onPeerUp occurs for the
previous leader after isolation, if no other candidate was registered and became
owner, then the previous leader will be assigned as owner.

Change-Id: I213bc3ecd3d1f7ebd099702390de2277109f92c2
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 0fab6c716548e89938c1a8493dc25991c006aa10)

DataBrokerTestModule: use AbstractDataBrokerTest without inheritance

While starting to write JUnit tests (for functional components) with the
AbstractDataBrokerTest, I found myself not liking its design which
forces one to use a class SomeTest extends AbstractDataBrokerTest...

Modern JUnit tests should be written by composition, not enforced
inheritance. Thus this new DataBrokerTestModule allows one to obtain a
DataBroker suitable for Tests from anywhere.

The *(Test)Module naming convention is obviously inspired by a DI point
of view (think Spring Framework's @Configuration classes, or Google
Guice or Dagger (yay!) *Module classes - which is exactly what this
really is - a particular way to obtain an instance of an
implementationof some service (here a DataBroker) in a certain
environment (here for tests).

The internal implementation of DataBrokerTestModule could be changed
later; I guess ideally what's in AbstractDataBrokerTest could go into
DataBrokerTestModule and AbstractDataBrokerTest could use it, but for
now I'm too lazy to change that, as this does the trick nicely. Later
implementation changes would be transparent to users of
DataBrokerTestModule.

Change-Id: I9641f527bbc0cb92732f2e513cdd64cc6a837200
Signed-off-by: Michael Vorburger <vorburger@redhat.com>
(cherry picked from commit e0819c56a40458d9eac0ebbbe1d2049f795cfe95)

BUG-5280: expose backing client actor reference

Exposing the actor directly allows us to retain only the context
without having to attach a specific behavior to implementation
utility classes.

Change-Id: I6fd26a23b08f4c7cb9d70d817aaf8deb44d55d88
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit cd2f680988549922b1c928b7c01fc31112eeca4c)

BUG-5280: make EmptyQueue public

Allow EmptyQueue to be used externally and correct its javadoc.

Change-Id: I4ae03095844ea235e735ffbdc48f974343a8e9a1
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 2d6c21fb4871826f8a3da9510a2a10cbc71c7bb9)

BUG-5280: fix a few warnings

This fixes javadoc warnings, addsa tiny bit of documentation
and corrects logging during recovery.

Change-Id: I13076ec052febb801a08c05adb7d8affca1fea82
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 50664aceae387ef6dc9a952f5a6d4105d0d3b4a7)

BUG-5280: add ExplicitAsk utility class

Akka's support for Explicit Ask with Java functions
is broken and requires a workaround. This patch moves
the previous implementation site to a reusable place
and fixes a caller which was broken.

Change-Id: Idc0fc961b1808c23e01a4ae8c4eafdc93b7974f2
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 2ebf9ef718ea7ddd790784a6d241e68ef8d1c564)

BUG-5280: Create AbstractProxyHistory class

Given the connection-oriented nature of SequencedQueue, we
really need to properly encapsulate various aspects of the client,
so we can perform proper state propagation, both during message
transmission and on reconnection.

This is a first step in that direction, which encapsulates client's
sendRequest() and self() methods at proper levels. It furthermore
makes state tracking in proxies consistent with state tracking in
their aggregate counterparts, hence each ProxyTransaction is guaranteed
to have an associated ProxyHistory.

Change-Id: I8c15b234ec813ac427e63a6e077ae17cde443be3
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 1d34f75864ac09d31ef0f7b4ef59f7434167ae15)

BUG-5280: move proxy instantiation to AbstractClientHistory

Histories should be the synchronization point for accessing
per-cookie proxies. Move instantiation code, making cookie/history
mapping internal to AbstractClientHistory.

Change-Id: I512e93d72b682668790a5dd213112d772143f045
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 95208fa5d24f3d7c2362ee619c9a6a294a69f7cd)

BUG-5280: separate request sequence and transmit sequence

Clean up the confusion in sequence numbers. There are actually
two sequences:
- logical request sequence, which indicates the order of requests
  in which they should be applied to the target entity. It is
  assigned by logic emitting the request.
- transmit sequence within a connection to the backend, as
  initiated by ConnectClientRequest. It is assigned by SequencedQueue
  as it is transmitting requests and reset when a new connection
  to the backend is made.

This requires establishing the concept of a session, which is
a single conversation between frontend and backend. It is severed
whenever the frontend times out and re-established when the leader
is found and it responds with ConnectClientSuccess.

The sending of ConnectClientRequest is not done via the queue,
as it is part of backend resolution process. Since this is not
a performance-critical path, we use simple Patterns.ask() to
send the request and get completion notification -- which we then
translate to ShardBackendInfo.

ConnectClientSuccess gives us backend-preferred version and
backend-specified cap on the number of outstanding messages then
it can handle concurrently. This maximum is used to limit the
transmit path of SequencedQueue, so that it does not attempt
to send more requests at any given time.

Internal queue for unsent requests is kept unbounded for now,
subject to a Semaphore-driven throttle in a follow-up patch.

Change-Id: I61663073bf6632c1ed8c036dee37f1ac39cf7794
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 9b4f21460c6dcb10c381df631d064d05de16546c)

BUG-5280: split out cds akka client substrate

This patch splits out the baseline frontend client
into a separate package.

Change-Id: I2d8ca8b81f29a45dd8c30f3bef467fcda94d4887
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit b0067e0a4bfa955f15c6259e019f954687264eff)

Move MessageTrackerTest

This moves the test to sal-clustering-commons, so we end
up testing in its home artifact.

Change-Id: Ie001bee6de92381ab6140a3a54e31c854a46ae1d
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 3dffbf36946550b6bf11ac03a80cd4e5c58dbbdf)

BUG-5280: add maxMessages field to ConnectClientSuccess

This field will act as a hint on how many messages may
be queued by the frontend towards the backend at any
given time.

Change-Id: Ibb8bbe2af9595bc0ecee090acea35aa78a9250b7
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit a73c52b4740be611728b4f9c70c67b2b36cf3916)

BUG-5280: add FrontendMetadata

This patch adds the frontend tracking abilities for followers.
It also defines the corresponding ShardDataTreeSnapshotMetadata
for use with persistence.

Change-Id: I7e2c6755c3389dcb5284f17a9c6076fb9e7ac95e
Signed-off-by: Robert Varga <rovarga@cisco.com>
Signed-off-by: Vaclav Demcak <vdemcak@cisco.com>
(cherry picked from commit edd61d79da614388134b0e0a618010c91e9c91bd)

Bug 6540: EOS - Prune pending owner change commits on leader change

When the shard leader is isolated, it attempts to re-assign ownership for down
peers. However, since it's isolated, it can't commit the modifications. If the
majority partition elects a new leader, when the partition is healed, the old
leader tries to forward the pending owner change commits to the new leader.
However this is problematic as the criteria used to determine the new owner is
stale and owner changes should only be committed by a valid leader. Since the
old leader is no longer the leader, it should not forward pending owner change
commits. However it still should forward local candidate change commits.

So I modified EntityOwnershipShardCommitCoordinator#onStateChange to iterate
the pending Modifications and remove WRITE modifications for the owner leaf
when the shard has transitioned to having a remote leader.

I also fixed an issue in EntityOwnershipShard#onCandidateRemoved that was
intermittently revealed by unit tests. Say candidate1 and candidate2 are
removed quickly for an entity and candidate1 is the current owner.
onCandidateRemoved is called for candidate1 and commits an update to write
candidate2 as the owner. If the write commit is still pending when
onCandidateRemoved is called for candidate2, the current owner will still
be candidate1 and the "message.getRemovedCandidate().equals(currentOwner)"
check will fail and thus the owner isn't cleared and candidate2 will remain
as owner. This results in a node being the owner w/o being in the candidate
list. (This patch may fix Bug 6672 as well)

A new testLeaderIsolation case was added to EntityOwnershipShardTest. Also I
reworked the tests and removed the use of the MockFollower and MockLeader
actors for consistency and also so the tests use the real EOS shard.

Change-Id: I5039b07d02f8571ee2d1affb0f364ea278641e91
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 07c96b0fa318b7bf559df4954f705d06a44f1354)

Bug 6540: Fix journal issues on leader changes

Fixed a couple issues with journal syncing on leader changes and isolation.

Consider the scenario where a leader is isolated and the majority partition elects
a new leader and both sides of the partition attempt to commit entries independently.
Say the term was 1 and last journal index was 2 prior to isolation and was replicated
to all followers and applied to state. After isolation, the isolated leader appends a
new entry with index 3 and attempts to replicate but fails to reach consensus.
Meanwhile, the new leader appends its own new entry with index 3 and is successfully
replicated to the remaining follower and applied to state. The commitIndex in the
majority partition is now 3. The new leader attempts to send AppendEntries to the
isolated leader but doesn't get any replies so it marks it as inactive.

When the partition is healed, the isolated leader converts to follower when it hears
from the new leader with the higher term. Since the new leader has marked the isolated
leader as inactive, the initial AppendEntries that the previous leader sees will have
no entries and the leaderCommitIndex will be 3. This is greater than the current
commitIndex 2 so the previous leader will update its commitIndex to 3 and apply its
entry with index 3 to the state. However this entry was from the previous term 1 which
was not replicated to a majority of the nodes and conflicts with the new leader's entry
with index 3 and term 2. This is a violation of raft.

This violation occurs as a result of the new leader not sending any entries until it
knows the follower is active. This is for efficiency to avoid continuously trying to
send entries when a follower is down. This is fine however the leader should not send
its current commit index either since it doesn't know the state of the follower. The
intention of the empty AppendEntries in this case is to re-establish connectivity with
the follower and thus should not cause any state change in the follower. Therefore I
changed the code to send leaderCommitIndex as -1 if the follower is inactive.

The other case where the leader purposely sends an empty AppendEntries is when the
leader is in the process of installing a snapshot on a follower, as indicated by the
presence of a LeaderInstallSnapshotState instance in the FollowerLogInformation. The
empty AppendEntries is still sent at the heartbeat interval to prevent an election
timeout in case the snapshot capture/transfer is delayed. Again, the AppendEntries
should not cause any state change in the follower so I also changed the leader to send
-1 for the leaderCommitIndex. As a result, I also changed it so that the leader
immeditely records a LeaderInstallSnapshotState instance in the FollowerLogInformation
when it initiates the async snapshot capture. Previously this was done when the capture
completed and the RaftActor sent the SendInstallSnapshot message to the leader
behavior. However it may take some time to capture the snapshot and intervening AppendEntries heart beats may be sent to the follower.

The other issue in the above scenario is that the conflict with entry 3 is not
immediately detected. On the first AppendEntries, the previous leader reports back
a successful reply with lastLogIndex 3 and lastLogTerm 1 b/c the previous
index (2) and term (1) didn't conflict. The new leader sets the previous leader's
match index to 3 and thinks index 3 has been replicated to all the followers and
trims its in-memory log at index 2. Eventually when the next entry with index 4 is
replicated, the previous leader will detect the conflict as the leader's previous
log index 3 and term 2 will be sent in the next AppendEntries. The new leader will
backtrack and eventually install a snapshot to sync the previous leader however
it's inefficient and should be unnecessary. The leader should detect the conflict
immediately on the first AppendEntries reply. So I changed handleAppendEntriesReply
to check that the follower's lastLogTerm matches the leader's term for that index.
If not, the leader sets the follower's next index to lastLogTerm - 1. This prevents
the leader from trimming its log and the next AppendEntries will include the
conflicting entry which the follower will remove/replace.

Change-Id: I7a0282cc4078f33ffd049e4a0eb4feff6230510d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 74524984b8e8625f6b8e8c791c584844d49ccf45)

Bug 6540: Move LeaderInstallSnapshotState to FollowerLogInformation

AbstractLeader maintains a Map of followerId -> LeaderInstallSnapshotState
in parallel to the Map of followerId -> FollowerLogInformation. It makes
sense to move the LeaderInstallSnapshotState into the FollowerLogInformation
instead of maintaining 2 Maps.

Change-Id: Ia0b58fad9bb2fde42d8c1ba4b0f7aae4eb11abb5
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 95d3c7975a423951dcbdecfbfa4cb6b7a23591cc)

Bug 6540: Refactor FollowerToSnapshot to its own class

Refactored FollowerToSnapshot to its own class and renamed to
LeaderInstallSnapshotState. This will facilitate subsequent patches.

Change-Id: Ie2540ddce1869a9972c8f3d547b0567c3d663aff
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit d3e310b940b60f6590f0e94a576aece95a055942)

Fix relativePaths for mdsal-it-parent under controller

Change-Id: I5c845571a3aadfaac5cf72a83d16b46d695be910
Signed-off-by: Anil Belur <abelur@linuxfoundation.org>

Bumping versions by 0.0.1 for next dev cycle

Change-Id: I1de05ad4c22e7cd8082d9868fbc62d0f7942b347
Signed-off-by: Anil Belur <abelur@linuxfoundation.org>

Bug 6659: Fix intermittent PartitionedCandidateOnStartupElectionScenarioTest failure

The didn't setup the SimpleReplicatedLog, commitIndex, lastAppliedIndex correctly.

Change-Id: I76fbf98f1a227245ca3a61e399258bd3bd4e743a
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 6540: EOS - Rework behavior of onPeerDown

https://git.opendaylight.org/gerrit/#/c/26808/ modified the behavior of
onPeerDown to remove all the down node's candidates. However this behavior is
problematic in the case when the shard leader is isolated. The majority partition
will elect a new leader which temporarily results in split-brain and 2 leaders
which independently attempt to remove the other side's candidates. When the partition
is healed, all hell breaks loose trying to reconcile their differences. This is
compounded with the singleton service because it uses 2 entities that are related
to one another.

To alleviate this, I reverted back to the behavior of selecting a new owner for
the entities owned by the down node and leaving the down node as a candidate.
In the case where the down node is the only candidate, it leaves it as the owner.
This doesn't hurt anything and avoids complications with having to re-instate the
down node as owner when it re-joins if it was actually isolated. The idea here is
to keep its candidacy to minimize disruption until proven otherwise since we don't
know if the downed node's process is actually still alive. If another node registers
a candidate it will replace the down node as the owner.

To handle the case where the down node actually restarted, after startup when it
first hears from the leader, it sends a RemoveAllCandidates message to the leader to
remove it from all entities. This cleans out stale candidates should no local client
register a candidate in the new incarnation.

The unit tests revealed an orthogonal issue with the PreLeader state. The PreLeader
switches to Leader when the commit index is up to date but before applying the entries
to the state. However the EOS may commit modifications immediately before the
ApplyState message for prior entries is received. This can result in the "Store tree X and candidate base Y differ" exception. So I modified the PreLeader behavior to
switch to Leader when the last applied index is up to date. This makes sense b/c
the PreLeader bevavior is intended to protect the state from inconsistencies.

I also fixed a couple bugs where the downPeerMemberNames was accessed with a String
rather than a MemberName instance. This was a remnant of changing downPeerMemberNames
to store MemberName.

Change-Id: I326660c172353539146a2216cc8a70a4b842affe
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 6540: Notify listeners on applySnapshot

When a snapshot is applied from the leader, we weren't notifying data change listeners.
We should.

Change-Id: If721c2ce7e6f27aa01f7babc0a0ad3c4468840c1
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-5280: persist metadata in snaphots

Note: this patch needed to be cherry-picked before
https://git.opendaylight.org/gerrit/#/c/45028/ coild be cherry-picked.

This patch adds the wiring in ShardDataTree to persist
various pieces of metadata in a snapshot. It also includes
metadata recovery from a snapshot.

In order to make this work, this patch centralizes all
actual payload and snapshot handling within the ShardDataTree
by introducing explicit entrypoints for each avenue through
which data can be introduced.

Change-Id: Ibc15bd152bd44dd583d67bb7fc61bc8f3086df30
Signed-off-by: Robert Varga <rovarga@cisco.com>

Bug 6587: Retain state when transitioning between Leader and IsolatedLeader

If there's a transaction in the COMMIT_PENDING state, ie it has been persisted and
is in the process of being replicated, and the Leader switches to IsolatedLeader, the
ClientRequestTracker state is lost. As a result when the follower(s) come back and
replication consensus is achieved and the tx is applied to state, the tx ID isn't
available and the ShardDataTree applies it as a foreign candidate, leaving the
tx in the pending queue. This prevents subsequent transactions from making progress.

To fix this, we need to retain/copy the internal leader state when transitioning
between Leader and IsolatedLeader.

Change-Id: If06996dccf083fd5d37757fd91fde2eb0eb82ea1
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit a0b8be5ce48c0d1e0b573d1952211913c58d4935)

Bug 6535: Fix feature config pusher warning

The feature config pusher assumes config files installed by features
that have XML extension are CSS files and tries to parse them. On failure
it prints a warning but ignores it and moves on. Up till now we've only
had CSS XML files but we now have XML files related to the
clustered-app-config. To avoid the warning I added a check to see if the
root element is "snapshot" if JAXB parsing fails.

Change-Id: I877921afc13564f131f61a1eb8327db71d3638fe
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix issue when AE leader differs from prior install snapshot leader

When a leader snapshot install is in progress, a follower doesn't process
AppendEntries and merely returns a reply with its last term/index. However,
if an install snapshot is initiated by a leader and there is more than one
chunk to send, it's possible for a leader change to occur prior to completing
sending all the chunks. When this happens, the new leader will begin sending
AppendEntries but the follower won't process them and make progress. We need
to clear the follower's snapshot state if the AppendEntries leaderId doesn't
match the prior install snapshot leaderId.

Change-Id: I4051bd064b6a20f4bcfe38b50656488fcb09274e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 95d7b8820236d16cb7e37c4a95fcae6f6d55581e)

Fix issues when persistence enabled

When persistence is dynamically enabled, we start persisting subsequent log
entries which causes issues on restart due to a gap in journal indexes. We need to
persist a snapshot with the current state to avoid this.

Also if persistence is disabled, we still persist ReplicatedLogEntry instances that
have a PersistentPayload, of which ServerConfigurationPayload is currently the only one.
This also can cause gaps in the persisted journal indexes which cause issues if
persistence is later enabled. To avoid this, we really shouldn't persist
ReplicatedLogEntry instances at all if data persistence is disabled since we don't
add them to the in-memory journal on recovery anyway - we just recover and apply the
ServerConfigurationPayload. Instead we should persist just the
ServerConfigurationPayload.

Change-Id: Ief78d68423b33aac1649220a36d32ff50f493eb7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit c42a5e91e0dcfc499b33a321ef45c0d310d366cc)

Bug 6278: Move opendaylight-karaf-empty to odlparent

Move opendaylight-karaf-empty from controller to odlparent. As with
moves of other artifacts, the following process will happen:

1) Copy opendaylight-karaf-empty into odlparent and adjust the groupId.
   The patch that handled the copy work is:

   https://git.opendaylight.org/gerrit/#/c/43988/

2) Change opendaylight-karaf-empty in controller to derive from odlparent's
   opendaylight-karaf-empty.  This is handled in this patch.

3) Change references to use the artifact added in #1.

4) Deprecate and remove controller's opendaylight-karaf-empty.

This patch just handles #2, re-parenting the existing controller
opendaylight-karaf-empty such that it derives from odlparent's
opendaylight-karaf-empty (added in #1).

Change-Id: Ifbfedd8a06f5f03900277d005906af83220707cc
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Take snapshot after recovery on migrated messages

Modified RaftActorRecoverySupport to capture and persist a snapshot
after recovery when there are migrated messages recovered. It utilizes
the new MigratedSerializable interface.

I also created equivalent classes in the persisted packages for
UpdateElectionTerm, DeleteEntries and ApplyJournalEntries that implement
MigratedSerializable and use the Externalizable proxy pattern. The
existing classes were deprecated and readResolve to the new classes.

Change-Id: Ia2e664de9ffd59991c49160424b13bc8ca0bfcbf
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix FinalModifierException due to BindingToNormalizedNodeCodec final

The BindingToNormalizedNodeCodec class is final which prevents blueprint from
being able to create a proxy instance when advertising as a service. This is
is OK as it will use the actual instance but it logs an exception:

2016-08-11 23:03:18,215 | INFO | rint Extender: 2 | ServiceRecipe
| 15 - org.apache.aries.blueprint.core - 1.6.1 | Unable to create a
proxy object for the service .component-2 defined in bundle
org.opendaylight.controller.sal-binding-broker-impl/1.5.0.SNAPSHOT with
id. Returning the original object instead.
org.apache.aries.proxy.FinalModifierException: The class
org.opendaylight.controller.md.sal.binding.impl.BindingToNormalizedNodeCodec
is final.
at
org.apache.aries.proxy.impl.gen.ProxySubclassGenerator.scanForFinalModifiers(ProxySubclassGenerator.java:330)[12:org.apache.aries.proxy.impl:1.0.5]

Although it's logged at INFO level we should avoid it.

Changing BindingToNormalizedNodeCodec to non-final fixes the exception however it causes
an error when it tries to create the proxy b/c some methods are final. To avoid this, weneed to avoid the proxy creation altogether so I added a method to
BindingToNormalizedNodeCodecFactory to advertise the OSGi service instead of using
the blueprint <service> element.

Note: it's not good practice to advertise a service with the actual class but it's
needed with BindingToNormalizedNodeCodec for backwards compatibility with CSS modules
that inject the BindingToNormalizedNodeCodec instance and not its interfaces. The
BindingToNormalizedNodeCodec CSS service identity is deprecated and will/should go away in the future.

Change-Id: I96b6cb8b030de39808de17142d79f8bbd09bf735
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit ed940fc416097e519428eaa3bba9e0d7d126b8cf)

Bug 5450: Query akka cluster state on Follower ElectionTimeout

Added changes to query the akka ClusterState to see if the leader is
actually unreachable or not Up on ElectionTimeout. If not, Follower
reschedules election timer and stays as Follower.

Change-Id: I3a054a82edbe975ad9e27c4d208083b19b392d2d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit a7740542c8ce1985c0a35767966c781805dfad84)

Change InstallSnapshot and reply to use Externalizable Proxy

This makes InstallSnapshot cleaner with no public no-arg constructor.

I also removed the InstallSnapshot protobuff message. In addition,
SerializableUtils is no longer needed as there's no more protobuff
messages.

Change-Id: I17aa4f7195cf09b798daee5587bbf50ccbc4bff0
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit bad1f8b8f3c1780cd37ec8a817ef4b0f23901654)

Move ServerConfigurationPayload to cluster.raft.persisted

This introduces its mirror copy and modifies the old class
so that it readResolve()s to the new class. It also adjusts
all users to use the new class.

The new class uses Externalizable proxy pattern to allow the
class itself be evolved without breaking compatibility. Also
NoOpPayload is retrofitted this way, which makes all subclasses
of Payload not have their serialization format tied to Payload
itself.

Change-Id: I26010a9e1438dbc4cb1822e1c4dbb51e2b6e538e
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 9d5ec5cdd146a56bc03e35b6718e9492a5c8410a)

Bug 6348 : car:stop-stress-test RPC to return success & failure counters

Current RPC car:stop-stress-test doesn't return how many
cars are created or failed. Adding success and failure counters
will help user to determine the number of cars created or failed
during the the process of creation of car tests using
car:stress-test. This patch enhances car:stop-stress-test RPC.

Change-Id: Iff054c8210ce49f06b4fa96ca5a437d9b82deddb
Signed-off-by: Sai MarapaReddy <sai.marapareddy@gmail.com>
Author: Sai MarapaReddy <sai.marapareddy@gmail.com>
(cherry picked from commit 30cdcd430e09b46e2b9d523492742e009c1dc88e)

BUG-6111: fix a thinko

Failure to initialize isOpen leads to the codepath never being
triggered.

Change-Id: I20f1b76c9ada581edc1c92c61447fd97d0d1b2ea
Signed-off-by: Robert Varga <rovarga@cisco.com>

Update .gitreview for stable/boron

Change-Id: Ia172d038b3240046a9c11e18fc18fad28122bb0e
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

BUG-6111: implement PingPongTransactionChain cancelation

This patch implements transaction cancelation in PingPongDataBroker,
which has slightly different semantics -- if a transaction is canceled
while being in a batch, proper isolation of the batch is maintained
and after preceding batch completes, the transaction chain is aborted.

Since there is no transaction isolation within a batch, this is the
only course of action we can take.

Change-Id: I0058503165dbfba8748a17a9ef9272265f4bc1c9
Signed-off-by: Robert Varga <rovarga@cisco.com>

Add PMD exclusion for config-generated files

Unfortunately PMD does not support wildcard
root exclusions, hence we have to match odlparent
configuration and extend it.

Change-Id: I4bc7a1b8c25b75cb5b348fb2a16f0e5b2c111359
Signed-off-by: Robert Varga <rovarga@cisco.com>

Bug 5504: Add PreLeader raft state

The following scenario can result in a "store tree and candidate base
differ" IllegalStateException on commit:

A follower receives a replicate and adds it to the log, say at index 1,
but the leader transfers or dies before committing and applying it to the
state. The follower becomes leader and when the next tx is applied, log
index 2, it has to first apply all log entries from the previous term that
hadn't been committed yet, in this case index 1. Since we got consensus for
index 2 that means index 1 has also been replicated to a majority. Therefore
ApplyState is sent for index 1 and then index 2. However index 1 is applied
as a "foreign" candidate while index 2 is in the pre-commit state. When
index 2 is applied the commit fails.

To prevent this scenario, we introduce a new raft state, PreLeader,
which is transitioned to from Candidate if there are uncommitted
entries, ie commit index < last log index. The PreLeader state performs all
the duties of Leader with the added behavior of attempting to commit all
uncommitted entries from the previous leader's term. Raft does not allow a
leader to commit entries from a previous term by simply counting replicas -
only entries from the leader's current term can be committed (§5.4.2). Rather
then waiting for a client interaction to commit a new entry, the PreLeader
state immediately appends a no-op entry (NoopPayload) to the log with the
leader's current term. Once the no-op entry is committed, all prior entries
are committed indirectly. Once all entries are committed, ie commitIndex matches
the last log index, it switches to the normal Leader state.

The PreLeader state is considered an inactive leader state and thus
client transactions are delayed until it transitions to Leader.

Change-Id: I20a541de0eba9b0075b9952dc6d5808943b7bb8f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-5280: expand ShardDataTree to cover transaction mechanics

A chunk of ShardCommitCoordinator should actually be implemented
by ShardDataTree. This includes transaction queueing, commit timers,
interaction with user cohorts and persistence.

This patch implements the relevant operations in an message-agnostic,
callback-driven way.

Fix: ShardDataTreeTest (missing ShardStat MBean)

Change-Id: I353bacce8245df85c5f4d6b4cc0ce5416f2f0337
Signed-off-by: Robert Varga <rovarga@cisco.com>
Signed-off-by: Vaclav Demcak <vdemcak@cisco.com>
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix relativePath declaration

Since karaf-parent's parent is outside of controller,
relativePath has to be empty.

Change-Id: I491c73f2d42b8d5f3e159625f8ba01fdadd32497
Signed-off-by: Robert Varga <rovarga@cisco.com>

Return shortened string from TransactionIdentifier.toString

For debug logging we need a shortened string for better readability and
grepping. The standard toString is way too long. I changed toString to a
similar compact form that we had before. adding in the frontend generation id
and type, eg

member-1-datastore-config-fe-1-txn-3
member-1-datastore-operational-fe-1-chn-2-txn-3

Change-Id: I942eaaa0e8ceedf42eed964f2a2e3a76d8c09806
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Enable akka WeaklyUp feature

By enabling allow-weakly-up-members, akka will allow new nodes to join a
cluster if there are unreachable nodes. However, this only pertains to
new nodes that weren't previously in the cluster. Unfortunately it
doesn't pertain to node restarts where a node was in the cluster then
attempts to re-join with a new incarnation, which is what we really want.
Despite that, it will at least work for new nodes so I think it's worth
enabling. Akka might be further enhanced to broaden WeaklyUp to include
new incarnations (there's requests for that).

I also changed the ShardManager to handle MemberWeaklyUp events in
the same manner as MemberUp.

Change-Id: I5cf6c1967162b8a9bc6ffb59d34a50560699e4ca
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 6278: Copy karaf-parent from controller to odlparent

As discussed in the MD-SAL call, there is an architectural need to move
karaf-parent from the controller project to the odlparent project.  This
is particularly useful for karaf upgrades, since right now a bump in karaf
version within odlparent requires a rebuild of controller to reflect the
change in karaf-parent, and our build jobs are not set up to support such
a process.

The move process will be handled in multiple steps:

1) Copy karaf-parent, karaf-branding and opendaylight-karaf-resources to
odlparent.  All three of these should belong in odlparent.  All three must
be moved since karaf-parent depends on the latter two artifacts.  Since
controller depends on odlparent (and not the other way around), they must
be moved upstream to odlparent.

2) Have controller's karaf-parent derive from odlparent's karaf-parent.
This preserves the ability for downstream consumers to derive from the
controller karaf-parent in the interim, while allowing changes to odlparent's
karaf-parent to be recognized since controller does not need to be rebuilt.
[THIS PATCH]

This also involves removing karaf-branding and opendaylight-karaf-resources
from the controller project, since they are no longer needed.  There are two
consumers that need to be patched:
lispflowmapping: https://git.opendaylight.org/gerrit/42647
vtn: https://git.opendaylight.org/gerrit/42648

3) Change all downstream projects to utilize odlparent's karaf-parent.  This
is future work and will be done in several patches.

4) Remove controller's karaf-parent once we feel all downstream consumers
are using the odlparent's karaf-parent.

Change-Id: Ib42ff5212bbfb93883346a19855544df4fb06d61
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Do not use ShardDataTree in PruningDataTreeModificationTest

This test requires on a DataTree, hence use that.

Change-Id: I37697121f6686cdfe6b1d71ca87ff79281619532
Signed-off-by: Robert Varga <rovarga@cisco.com>