git.opendaylight Code Review - controller.git/log

Make methods static

Private methods which do not touch object state can be made static.

Change-Id: I4f5a7e6215c7570660ee797f4e694745844f72e7
Signed-off-by: Robert Varga <rovarga@cisco.com>

Bug 4564: Implement restore from snapshot in RaftActor

The restore snapshot is supplied by the derived actor's
RaftActorRecoveryCohort. If one exists the the RaftActorRecoverySupport
desrializes and applies the snapshot.

I also add a Builder to MockRaftActor to make it easier to pass
additional params.

Change-Id: Ib52b24331038ed48221cc27086fa3cceafe39fcf
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 4554 : Ownership is not cleared when all candidates are removed

When all candidates for an entity get unregistered at approximately
the same time it can create a situation where the owner for the
entity is not cleared. Consequently no entity ownership change is
raised where hasOwner is false even when there are no owners for
the entity.

This could be a problem for applications which do
some action when there are no candidates for an entity. The
openflow application for example relies on the disappearance of
all owners to actually remove a switch from inventory. Without
this event we have the situation that nodes hang around in inventory.

Problem Sequence
----------------

The sequence of events which leads to this problem are as follows.

Let's say member-1 owned entity-1 and there are 3 candidates for
entity-1 - member-1, member-2 and member-3. Now let's say due to
some event all candidates have to unregister. The data
transaformations will go like this.

delete member-1
delete member-2
delete member-3
delete member-1 succeeds so choose new owner - in this case member-2
make-owner member-2
delete member-2 succeeds - member-2 is not the current owner so do nothing
delete member-3 succeeds - member-3 is not the current owner so do nothing
make-owner member-2 succeeds. Now we have an owner for entity-1 even though we have not candidates

Solution
--------

The solution proposed in this patch is to set member to empty when
there are no remaining candidates. This changes the above sequence as follows.

delete member-1
delete member-2
delete member-3
delete member-1 succeeds so choose new owner - in this case member-2
make-owner member-2
delete member-2 succeeds - member-2 is not the current owner so do nothing
delete member-3 succeeds - member-3 is the last candidate so set member to ""
make-owner ""
make-owner member-2 succeeds. Now we have an owner for entity-1 even though we have not candidates
make-owner "" succeeds. Now we have owner for entity-1 set to no one as it should be

Change-Id: I583e8c6991742ada5846e87da35db255eeed144e
Signed-off-by: Moiz Raja <moraja@cisco.com>

BUG 4615 : Add method on EOS to check if a candidate is registered locally

Change-Id: Iedb2e4cf92553910cf5e1bd85978f88e10bf3c25
Signed-off-by: Moiz Raja <moraja@cisco.com>

Implement LeastLoadedCandidateSelectionStrategy

Change-Id: I09035505bcfa0ef5b2ac357217186ad98db7974c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Maintain EntityOwnershipStatistics

Implementing a LoadBalancing entity owner selection
strategy depends on our ability to find the load on
specific candidates. The EntityOwnershipStatistics collects
this information and provides query methods to access
ownership counts for candidates.

Change-Id: I7e812b15e8fb21e3be1aed10384600b9acb8bf20
Signed-off-by: Moiz Raja <moraja@cisco.com>

Add a mechanism to read the entity owner selection strategies from a config file

Change-Id: Ie951e4f83aaf38f00e959f4243820a88cb988788
Signed-off-by: Moiz Raja <moraja@cisco.com>
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Pass in EntityOwnerSelectionStrategyConfig when constructing DistributedEntityOwnershipService

Change-Id: Iad1014db726a06de9a89a9987216ca4c96981122
Signed-off-by: Moiz Raja <moraja@cisco.com>

Pass in EntityOwnerSelectionStrategyConfig when constructing EntityOwnershipShard

Change-Id: I56c2f4f87c61e81b662cd0b30c60775389e9b9a3
Signed-off-by: Moiz Raja <moraja@cisco.com>

Allow passing of delay to the EntityOwnerElectionStrategy

Change-Id: If745443585e68a26c10622a7888ec52dbee0059c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Add Delayed Owner selection base on strategy

Change-Id: I04fc216ffc7e5c3fd35b34b6d03a5030c359d77f
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 2187: AddServer: check if already exists

On AddServer, if the new server already exists as a peer return
ALREADY_EXISTS status reply.

Change-Id: I3b324850e1f05fce72eced3b2ced52f1510973fe
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 2187: Increases test coverage in RaftActorRecoverySupport

This is a follow-up patch to
https://git.opendaylight.org/gerrit/#/c/29112/ to add more unit test
coverage.

Change-Id: I1dcd87c9bed55b75eed03e7736b0165f656f661f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 2187: Return OK reply after AddServer persist

The AddServer processing was changed to return OK reply as soon as the
new ServerConfigurationPayload is persisted without waiting for
consensus. Prior, since the new server config is applied immediately in
the leader, if consensus wasn't reached, this would cause the
ShardManager on the calling side to delete new follower actor, resulting
in a "zombie" peer in the leader. Even if consensus isn't reached, the
new server config would've at least most likely been replicated to the
new follower and other down followers would eventually be replicated
when they come back up.

Change-Id: I425fa78d5dd023feda7913ed8d1b5b6c285ccae4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 4589 : Handle writing and reading large strings

Change-Id: If81926757aef3c1275ba43a7cf8c7adf94d86e08
Signed-off-by: Moiz Raja <moraja@cisco.com>
(cherry picked from commit 28484d59aa626dd4b32cdeb2d10dbc2c47cc051a)

Bug 4564: Add Shard Builder class

Added a Builder class to Shard to replace the props and Creator
classes to make it easier to pass new params to Shard w/o having
to change a lot of code and unit tests. An upcoming patch will add
a new param.

Change-Id: I122747d0cc6c14f090026efe81425e1e1e4edc37
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Remove duplicate junit dependency

odlparent is already declaring the scope as test, no need to repeat
that. Fixes warnings in autorelease.

Change-Id: Ia0b6550d2ecbce80eefa168d78c8b50e29100698
Signed-off-by: Robert Varga <rovarga@cisco.com>

Introduce EntityOwnerSelectionStrategy

Currently the EntityOwnershipService does not do any load
balancing, in that it allows the first candidate that registers
to become an owner. There is a need to do that so that applications
which choose to do some *work* based on if it owns an entity can
scale better.

This patch introduces the concept of an EntityOwnerSelectionStrategy
with the intent to provide custom strategies later to choose an owner.

Since custom strategies require intimate knowledge of how the
EntityOwnershipShard chooses a leader at this time I do not think
a strategy can be passed to the EntityOwnershipService via API. The
intent therefor is to choose a strategy based on configuration
wherein a custom strategy can be chosen for each entity type. If
the Strategy needs any custom configuration then it can have configuration
files of it's own

Change-Id: Ia53b8edb59fb1d06a426d9d9a95c07ef4ae65cd1
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 2187: Recover Peer Id's and Update peer map during Journal recovery

Recover ServerConfigurationPayload ReplicatedLogEntry's and immediately apply to the peer map in RaftActorContext.
Review Comments incoporated.

Change-Id: I1b1b3c21e83eb5ea799dd040a4da8f78f1155082
Signed-off-by: Rajesh_Sindagi <Rajesh_Sindagi@dell.com>

Clean up duplicate/unused dependencies and properties

Remove dependencies and properties provided in odlparent (with the
same versions).

org.json.version in features/mdsal/pom.xml is unused.

A few properties are only used once, in controller, so replace them
with the version in-place.

(All this will allow a number of properties to be removed from
odlparent.)

Change-Id: I07e9f2298ebd008d82b22b156dc2ddce50151641
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Cache config QNameModules

Use pre-instantiated and cached QNames, so we do not end up wasting
space unnecessarily.

Change-Id: I7ff7b9a098fbf182770d07ccbd0b9bb60334fb82
Signed-off-by: Robert Varga <rovarga@cisco.com>

BUG-4556: lazy computation of MXBean maps

Further analysis of our feature:install CPU usage shows that we spend
inordinate amount of time constructing MXBean maps. Make the
construction more asynchronous.

Change-Id: I69450bfe8debb65160c40aed6a75ff3d3bef831d
Signed-off-by: Robert Varga <rovarga@cisco.com>

Set odlparent-lite as parent for benchmark/pom.xml

Change-Id: I80e8c621a909fd4dde0a7d25d887ea4523451ce6
Signed-off-by: Vratko Polak <vrpolak@cisco.com>

Bug 4560: Improve config system logging for debuggability

Manually cherry-picked from
https://git.opendaylight.org/gerrit/#/c/28985 as the files have moved in
master.

Also the code has changed slightly in master, specifically the
ConfigPusherImplTest no longer uses a Thread uncaught exception handler
for verification. However it does rely on exceptions thrown from the
ConfigPusherImpl so, to keep the same behavior, I added a
propagateExceptions flag to ConfigPusherImpl#process. The
ConfigPersisterActivator production code passes false so unchecked
exceptions aren't handled as uncaught exceptions.

Change-Id: Iabc22030abc22cf11a1476986ba3d3366021b4fb
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Set odlparent-lite as artifacts parent

Change-Id: I4ae4994db55739460ca5d326865d7e704a2b8e26
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Bug 4564: Implement clustering backup-datastore RPC

Added a new RPC backup-datastore to send the GetSnapshot message to the
ShardManager's and persist the list of DatastoreSnapshots to a file.

I also renamed the cluster-config yang module to cluster-admin to make
it more general as the backup RPC isn't related to configuration.

Change-Id: I18e5d47f7052b890c3547066145e4d5d0fbe1277
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4564: Implement GetSnapshot message in ShardManager

Added a serializable DatastoreSnapshot class that stores the serialized
snapshot for each shard.

On GetSnapshot, the ShardManager sends a GetSnapshot message to each
shard and creates a ShardManagerGetSnapshotReplyActor to compile the
replies and return a DatastoreSnapshot instance to the caller.

Change-Id: I11f872aa701f1e51de9cbccdc1a372a76bc45cff
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4564: Implement GetSnapshot message in RaftActor

Added a new client message, GetSnapshot, to return a serialized Snapshot
instance. The implementation just captures the snapshot for return and does
not persist it. If data persistence isn't enabled, it does not initiate a
capture and returns a serialized Snapshot instance containing just the
persistable state, eg election term info.

Change-Id: I9ea7fc8e0e60c4d6874f5eb0188543e1d9b51243
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4149: Implement per-shard DatastoreContext settings

Added the ability to specify shard-specific settings in the .cfg file by
prefixing the shard name to the property name, similar to what we allow
at the datastore level.

I added a DatastoreContextFactory that has methods to get the base
DatastoreContext and a per-shard DatastoreContext. The
DatastoreContextFactory is now passed to the ShardManager instead of the
DatastoreContext. The DatastoreContextFactory uses the
DatastoreContextIntrospector to overlay per-shard settings onto the
base DatastoreContext.

Change-Id: I329c98c1577a74ebe665052f76e28da3867e2e86
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Added the data store benchmark (dsbenchmark, Bug 4519, https://bugs.opendaylight.org/show_bug.cgi?id=4519)

Change-Id: Ibc6d214b43b6353adbc49ba7b5b4a302ae1fbd95
Signed-off-by: Jan Medved <jmedved@cisco.com>

Speed up YangStoreService

Change-Id: Ibaf972650045b5d85be155f653f7eef36aae6c6e
Signed-off-by: Robert Varga <rovarga@cisco.com>

Bug 4563: Increase akka seed-node-timeout

Change-Id: I8f17872ef30a96d58a666e3499cf42ab59f0491d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix precondition string

The string has been corrupted, fix it up.

Change-Id: I36312ca4e5ca6365b3003a2ad57ca2734d156578
Signed-off-by: Robert Varga <rovarga@cisco.com>

Improve YangStoreService performance

Simple changes to eliminate synthetic methods and unneeded duplication
of collections.

Change-Id: I370d4ed85720e2b7eb811204afa9f532b716b16d
Signed-off-by: Robert Varga <rovarga@cisco.com>

Do not subclass Hashtable

Rather than subclassing, instantiate a Hashtable and fill it.

Change-Id: Icfd4e812759874a702a2506e9090cd20535bdc50
Signed-off-by: Robert Varga <rovarga@cisco.com>

BUG 3973: Add config option for Java-only leveldb

Add comment in akka.conf on how to use the Java-only
version of leveldb for platforms where native leveldb
is unavailable.

Change-Id: I5693522597152ef7f86bb89d4be32e20f0582806
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

Add leader unit test for non-voting consensus

Added a test case to LeaderTest to verify a non-voting follower
does not influence replication consensus.

Also I saw intermitent test failures (in jenkins as well during first
verify build) due to a message going to dead letters shortly after
actor creation (also reported in Bug 4223). Specifically it was occurring
when the leader sent the initial AppendEntries heartbeat to a follower. This
seems like a timing issue/bug in akka when using an ActorSelection. I
added code in the TestActorFactory to use an actorSelection and call
resolveOne in a retry loop. This seems to alleviate the issue as I ran
LeaderTest over 1000 times successfully.

Change-Id: I65cb87f419c280befe2d82300a981bd8e6f88742
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 2187: Address comments in https://git.opendaylight.org/gerrit/#/c/28596/

Addressed minor comments in https://git.opendaylight.org/gerrit/#/c/28596/.

Unified the response messages and debug messages.

Added persistenceId() format param to the debug messages for additional
context.

Change-Id: Ic1a4e852126425cf7ae67ee5b9ea301b06a3f9a8
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Always persist ServerConfigurationPayload log entries

We need to always persist ServerConfigurationPayload log entries
regardless of whether or not persistence is enabled for the derived
RaftActor's data.

I added a new tagging interface PersistentPayload, implemented by
ServerConfigurationPayload, to indicate a Payload
needs to always be persisted. Since log entries are persisted by
both the RaftActor and Follower behavior via the ReplicatedLog, the
logic to determine persistence based on PersistentPayload needs to be
available to both. The ReplicatedLog uses the persistence provider
contained in the RaftActorContext which is the
DelegatingPersistentDataProvider set by the RaftActor. So to keep
the rest of the code the same and keep it simple, I derived a
RaftActorDelegatingPersistentDataProvider which overrides persist to
handle the PersistentPayload logic utilizing the RaftActor's
existing PersistentDataProvider.

Change-Id: I243026b28ed57461ad92324b6947091ae74a7127
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Derive MockRaftActorContext from RaftActorContextImpl

I changed MockRaftActorContext to derive from RaftActorContextImpl since
it duplicates most of the functionality in RaftActorContextImpl and,
with the addition of PeerInfo, MockRaftActorContext can now provide the
same functionality as RaftActorContextImpl w/o having to duplicate it in
MockRaftActorContext. Also this will make it easier when the RaftActorContext
interface is changed.

Change-Id: Ief90232fc992a50b3f0fea5ece323a14916760f2
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-3381: Capture Snapshot on recovery if journal is not empty

Change-Id: Ib1068cb6d4848d151039887b51458399ff421178
Signed-off-by: evvy <dhiraviam.natarajan@gmail.com>

Add wait state for AddServer if snapshot in progress

It is possible a snapshot capture coild be in progress when we
attempt to initiate snapshot capture on AddServer. I added a wait
state to the FSM and a new message, SnapshotComplete, that is sent
by the SnapshotManager.

Added more unit test cases.

Change-Id: I119a264e03686ea70f7834e551c2fb45dd39f903
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 2187 - Creating ShardReplica

Creating local shard replica with a custom Raftpolicy. Informs Shard leader of the local shard.
Processes AddServerReply from shard leader.
On successful replication, makes local shard voting capable.
On replication failure, local shard is removed.

Incorporated the comments

Change-Id: Id2b90039c39211b20322bc2d141520723d44c391
Signed-off-by: kalaiselvik <Kalaiselvi_K@Dell.com>

BUG-2187: Non voting and Uninitialized followers are not to be counted towards consensus

Change-Id: I1ba86cf2e2f904847ea8f819e84a3dc54fcc31d2
Signed-off-by: Rajesh_Sindagi <Rajesh_Sindagi@dell.com>

Add voting state to ServerConfigurationPayload

Changed the internal state to a list of ServerInfo instances which
contain he server id and voting state.

Also removed the oldServerConfig field as it won't be needed.

Change-Id: I10b3ca8dc2ffed9b5db0a7d0f6ca74d73a837b8e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix small bug in startup archetype

Change-Id: I83913ed9f16b38f6e6fd461b76dece1a09f4c8ca
Signed-off-by: Ed Warnicke <hagbard@gmail.com>

BUG-2399: fixup tests

The test model specifies the top-level container as structural, yet the
tests expect it to exist when empty. Mark the container as presence,
restoring behavior expected by tests.

Change-Id: Ided99720468a8bee14d5c66342e524450f5a9050
Signed-off-by: Robert Varga <rovarga@cisco.com>

Introduce PeerInfo and VotingState

We need to store the voting state for each per so I created a
PeerInfo class to include, id, address and voting state (represented by a
VotingState enum). The RaftActorContext now stores PeerInfo instances
in its peer map and added methods to access PeerInfo. As a consequence,
RaftActorContext#getPeerAddresses was no longer needed and was removed.

AbstractLeader and Candidate were modified to utilize the PeerInfo to
calculate the majority vote/min replication count, ie ignore non-voting peers.

Previously we had added a FollowerState enum and stored it in the
FollowerLogInformation. Since voting state is now stored in the
RaftActorContext peer info, I removed the FollowerState from
FollowerLogInformation to avoid redundancy and having to keep both
up to date.

Change-Id: I1394511a8db7f0b9df3ed7879c77c1f44f3b143d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bump Akka to 2.3.14

Change-Id: Ia6bf3f1a4c025ec1e84662c04ccdc40c04e569a2
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

Remove checks for NormalizedNodeBuilderWrapper

Interface contract already guarantees returned objects are subclasses of
NormalizedNodeBuilderWrapper, the instanceof guards only non-nullness.

Switch to explicit assertNotNull() to reduce eclipse warnings.

Change-Id: Ibf0d73752c6e1ebeacbb10677e2f11f185098bd9
Signed-off-by: Robert Varga <rovarga@cisco.com>

Do not use MoreExecutors.sameThreadExecutor()

This method is deprecated, replace it with proper service/executor.

Change-Id: I7257a28f28784313cafc250f2c2fd1c623332dec
Signed-off-by: Robert Varga <rovarga@cisco.com>

Make REUSABLE_*_TL final

Since these are public static fields, they should be final to prevent
possible shenanigans.

Change-Id: I4a360e060ddde57a73118bcf3d053ce397204136
Signed-off-by: Robert Varga <rovarga@cisco.com>

Reduce ShardDataTree#getDataTree() callsites

A lot of these callsites perform a specific function, expose those
functions without leaking the DataTree. This is needed to handle
asynchronous persistence and optimistic transaction commit.

Change-Id: I330cb4172349e0d1d8daacc3aafce7dad64cd8b2
Signed-off-by: Robert Varga <rovarga@cisco.com>

Do not declare unneeded Exception throw

Fixes sonar warnings

Change-Id: I31ab95c75cf30b33c9025d6f6e4662ccc5df7a47
Signed-off-by: Robert Varga <rovarga@cisco.com>

Make private methods static

These methods do not reference object state and therefore can be made
static.

Change-Id: I416e415b90647b4f700b7893fe4f64f479271fab
Signed-off-by: Robert Varga <rovarga@cisco.com>

Add getPeerIds to RaftActorContext

For upcoming to work to add voting status to the peer info in
RaftActorContext, I added a getPeerIds method to replace calls to
getPeerAddresses as virtually all callers really just want the IDs or want
to check the size. getPeerAddresses will (likely) be removed altogether -
this is a preliminary patch.

Change-Id: I2b6f2c36dfec14ccd4bbfef35e67ed86cf3e3e45
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix resource leaks in TransactionChainProxyTest

Close TransactionChainProxy objects (AutoCloseable)
that were not being closed in the test cases.

Change-Id: I85b1f951545b764007bdb2e808a2438c9bd4b2b2
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

update leveldbjni version to support Solaris

Change-Id: I46de5b3cc9c220a70a408194fb3ff709cdff1937
Signed-off-by: rshoaib <rao.shoaib@oracle.com>

Bug 2187: Code cleanup and refactoring

I addressed remaining comments from a prior patch.

I also refactored RaftActorServerConfigurationSupport to use an FSM
similar to the SnapshotManager with some generic classes. This will
make it easier to implement RemoveServer and reuse code.

Change-Id: Id3cdcede3f9c393c878abd3e9a9d3a5e12c5fb8a
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Remove unused Jersey dependencies from the controller

The code utilizing Jersey was moved to the netconf project. This change
removes some of the deprecated dependencies.

Change-Id: I62b944497c976b1251412d8d047ef833e69dfb0a
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Remove unnecessary @SuppressWarnings

Change-Id: I2b59e7f29a15298c1135c12b6bd9699205706600
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Fix resource leaks in exception handling

Fix resource leaks when exceptions are
encountered during ConfigManagerActivator.start().

Change-Id: Ic12c756aa5a768add0bc62e71eed94e5b2fa5fea
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Bug 4037: Allow auto-downed node to rejoin cluster

This patch will detect when a node has been
auto-downed/quarantined by another node. When this
happens, the ActorSystem of the datastore will be
restarted to allow the node to rejoin the cluster.

Change-Id: I0913bf455d426b6a0fccb17eac61b74f0911fa5d
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Bug 2187: AddServer unit test and bug fixes

Follow-up patch to https://git.opendaylight.org/gerrit/#/c/28018/.

Got the unit tests working and added more unit tests to cover more code.

Also fixed several bugs in the code that were failing the tests. One bug
was caused by replicating data quickly after install snapshot was
complete. On the final install snapshot chunk the follower sends an
ApplySnaphot message to persist and apply the snapshot. On the reply,
the leader assumes the follower is up-to-date and sets its next index.
However, applying the snapshot, ie updating the log and commit index, is
actually done after the async callback from the snapshot persist. In between
that time, if the leader sends the server config AppendEntries, the follower's
log is still empty and it deems itself out-of-sync and reports back failure.
This will cause the leader to eventually send a new install snaphot
which isn't which is not desirable. Also it may delay consensus for the
server config entry.

To fix this, I delayed the final InstallSnapshotReply until after the
ApplySnapshot is complete. I did this by adding a Callback to the
ApplySnapshot message which the SnapshotManager invokes.

Also the new server config was constructed without the leader's ID - it
needs to contain all members.

Also the ServerConfigurationPayload wasn't being applied in the
followers.

Another issue was that, if the leader had no peers initially, the
heartbeat wasn't scheduled so, when the new server was added, heartbeats
weren't occurring. So I change addFollower to schedule the heartbeat.

I added a test for adding a non-voting server which caused an endless
loop in AbstractLeader#handleAppendEntriesReply where it updates the
commitIndex based on the replicated count. To fix this, I added a break
if the replicatedLogEntry is null.

Change-Id: I5dff351140c611d58357cd58900bed401606038c
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 2187 - JMX API for create/delete shard replica

Change-Id: I48a4dcb7983f5f231e9ddc04e851950abf7c2d8a
Signed-off-by: kalaiselvik <Kalaiselvi_K@Dell.com>

BUG-2187: Add Server - Leader Implementation

Processes addServer request from the follower, forwards the request
to the shard leader, if not the leader.

The follower shard replica data is brought to sync with leader by installing the snapshot from the shard leader.
On sucessful application of snapshot data, this voting but not initialized member is transitioned to voting member.
New server configuration is persisted and replicated to majority of the followers and responds back with OK message to the shard follower.

In case where the leader is unable to sync data to the follower in a configured time period, TIMEOUT message is responded back to the shard follower without adding/persisting the new server configuration.

Change-Id: I9a3870d14bb6ad532ff64f315b2e2000d8b803e2
Signed-off-by: Rajesh_Sindagi <Rajesh_Sindagi@dell.com>

Add cluster config yang RPCs and provider wiring

Added experimental RPCs, including AddShardRelica, with initial empty
implementations that return unsupported.

Change-Id: Ie8587903920760fc4555bc009c81183e8d7740e4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix DistributedDataStoreIntegrationTest failure

Fixed a timing issue with a test just started causing failures pretty
regularly on jenkins builds for some reason.

Change-Id: I40273574376804034fd6f14f56384cb8cae26900
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add missing dependencies

pax-url-aether provides javax.inject and commons-codec, but they need
to be declared separately for correctness and to allow upgrades to
newer versions of pax-url-aether.

com.google.inject.Inject can be replaced by javax.inject.Inject.

Change-Id: I0a1da43faf0345bd71c2737caaa840c396bc60ab
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Fix ModuleFactory not found errors

https://git.opendaylight.org/gerrit/#/c/27874/ improvements to the
config system but had the side-effect of introducing timing issues where
a ModuleFactory wasn't found when trying to push a config. The reason is
that yang schemas load earlier and much quicker than ModuleFactory's,
which are scanned from ACTIVE bundles, so the capabilities may resolve
but a ModuleFactory may not be available yet. As a result, that patch
was partially reverted for the time being.

To fix the missing ModuleFactory issue, I added retries in the
ConfigPusherImpl when a ModuleFactory isn't found, similar to the
ConflictingVersionException retries. The backend now throws a new
checked exception, ModuleFactoryNotFoundException, which is caught to
trigger a retry after a delay. Prior, it threw an
InstanceNotFoundException which was wrapped in an
IllegalArgumentException. I didn't keep the InstanceNotFoundException
b/c it can be thrown for other reasons and I wanted to distinguish
missing ModuleFactoryNotFoundException.

I derived ModuleFactoryNotFoundException from RuntimeException to avoid
having to change signatures in the call chain and thus changing the API.
Prior it threw an unchecked IllegalArgumentException anyway so it's
consistent plus other areas of the code throw unchecked exceptions along
with checked exceptions.

Since the missing ModuleFactory issue is fixed, I re-enabled scanning of
RESOLVED bundles in the ModuleInfoBundleTracker.

Change-Id: I89ff346c0a89afdfa76ce402f2cf3211ac68b5c0
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 4151 : Create a shared actor system

This patch adds an ActorSystemProvider interface in clustering commons
with a method to get a shard ActorSystem instance which uses the
clustered data store configuration as it contains more configuration
options than the rpc connector which pretty much uses stock configuration.
I added a config yang to define an actor-system-provider-service.

I added the ActorSystemProvider implementation and actor-system-provider-impl
config yang in the distributed datastore bundle. I tried it in
sal-clustering-commmons originally but ran into akka errors re: missing
config properties and it also couldn't find the
ReadyLocalTransactionSerializer class. So to avoid chasing down those
errors I put the implementation in sal-distributed-datastore. I think
this makes sense as it is the prime user of the actor system.

I added a dependency for the ActorSystemProvider service in both
datastores modules so the ActorSystem is now injected in and passed
to the DistributedDataStoreFactory. The dependency was also added to the
RPC mpdule.

Elements for the new actor system provider service and impl were added to
the 05-clustering.xml file along with the wiring changes for the data
stores and RPC modules.

Change-Id: I79c14f84c992a2d5ac9c1f1856efbaeba3cc2b77
Signed-off-by: Moiz Raja <moraja@cisco.com>

Fix Eclipse compilation warnings.

Fix compilation warnings in DistributedDataStoreTest
that DistributedDataStores were never closed. Also fix
NPEs on closing DistributedDataStores when the
MXBeans are uninitialized.

Change-Id: I5dcaa389e1e69f934e9016933b00be3adaf4529f
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Remove peer address cache in ShardInformation

The ShardManager caches the peer addresses in the ShardInformation and
uses it mainly to suppress PeerUp, PeerDown and PeerAddressResolved
messages to the shard for peers that don't have replicas for the shard.

This is fine with static config but With the upcoming work to dynamically
add replicas, the shard will take ownership of persisting its peers so
the ShardManager will not know about dynamic peers.

I changed the semantics of the peer addresses to initial peer addresses.
They are now only used to pass to the Shard on creation. As a result,
PeerUp, PeerDown and PeerAddressResolved messages are now always sent to
the Shard for all peers. The Shard/RaftActor decide ll whether or not to
process the peer message. I changed RaftActorContextImpl#setPeerAddress
to ignore a peerId it doesn't know about instead of throwing an ex.

The other usages of the peerAddresses were to lookup the leader address.
This is now done dynamically via the ShardPeerAddressResolver.

Change-Id: Ida9738916a4a85d23198e7c095d5c73f17e2aa6c
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Prepare for Karaf 3.0.4 upgrade

Pull in karaf.version from odlparent.
Drop import of org.apache.felix.service.command (apparently unused).

Change-Id: I6487ce1a52e6f51bbcdd4e332de18d4684782301
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Refactor to fix unchecked cast warnings.

Change-Id: I0fb6ce59707000f225ffa8d654685fbc89f8f2eb
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Fix Eclipse compilation warnings.

Change-Id: I16921743a8cc4ac8902c1b7fffa2edfd8cba8be6
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Fix Eclipse compilation warnings.

Change-Id: I2caddfded34638002b2e31bf4e99d1770dd03a00
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Reproduce bug 4359

Added a couple of unit tests which demonstrates the problem
described in bug 4359 where upon recovery a node which
is previously deleted reappears on reapplying the
candidates

For some reason the problem is reproducible only when the
car is added and deleted twice and not once. I haven't
investigated why yet.

Change-Id: I5f5a656ef6fdc017a3342c8b409576a8b121b7f1
Signed-off-by: Moiz Raja <moraja@cisco.com>

Partial revert of https://git.opendaylight.org/gerrit/#/c/27874/

Patch https://git.opendaylight.org/gerrit/#/c/27874/ made improvements
that significantly sped up config system boot and helped the SFC project
but a couple other projects are seeing a timing issue where a
ModuleFactory isn't found and the config pusher fails. This is due to
the speed up and that YangModuleInfo's are now scraped from RESOLVED bundles
and thus are available quicker but ModuleFactory's are scaped from
ACTIVE bundles.

While the ModuleFactory issue is addressed, I'll partially revert the
prior changes to go back to scanning ACTIVE bundles for YangModuleInfo.

Change-Id: Icd3a51a049a940ad60a4bd0071e3c969167275d3
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add ShardPeerAddressResolver

Added a ShardPeerAddressResolver implementation that is passed to
Shard RaftActors to resolve addresses for shard peer ids. I refactored
ShardManager a bit to move the memberNameToAddress map and related code
to the ShardPeerAddressResolver.

Change-Id: I5cbef5816d9bf13a339e43008144f44fd55fc606
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Modify ModuleInfoBundleTracker to track RESOLVED bundles (round 2)

My first patch https://git.opendaylight.org/gerrit/#/c/27138/ didn't
do well with the feature tests due the BundleContext bust wait so it
was reverted.

I went back to my original solution to confgure the ModuleInfoBundleTracker
to track RESOLVED and ModuleFactoryBundleTracker to track ACTIVE.
Originally when I tried that I had some failure due to the ModuleFactory
not loaded yet but I don't remember exactly what. This patch seems to work
fine - I've restarted karaf several times and also ran the tsdr features tests
several times successfully. Originally I did the first patch in stable/lithium
so maybe something else has changed in master or the way I did it wasn't right.

Since the initial yang module info's are now processed synchronously when the
BundleTracker is opened, I modified the ModuleInfoBundleTracker to
ensure it doesn't propagate runtime ex's. This would disrupt the
BundleTracker and the ConfigManagerActivator - if one module had an
issue the config manager wouldn't start.

For every YangModuleInfo scraped, it registers it with the
ModuleInfoRegistry. The backing impl is RefreshingSCPModuleInfoRegistry
which causes a new SchemaContext to be created from the current yang
models (via updateService). This isn't efficient - on startup, we'll get all
YangModuleInfo's in quick succession so, optimially, it should build the
SchemaContext once after open is complete. This is what the
GlobalBundleScanningSchemaServiceImpl does.

To accomplish this, I removed the call to updateService from
RefreshingSCPModuleInfoRegistry#registerModuleInfo - it is now
specifically called by ModuleInfoBundleTracker. This means the
ModuleInfoBundleTracker now references RefreshingSCPModuleInfoRegistry
instead of the ModuleInfoRegistry interface which makes it less clean.
Any other way would require changes to the ModuleInfoRegistry interface,
which I didn't want to do, or extending the interface which I didn't think that
was worth the effort. The RefreshingSCPModuleInfoRegistry is only used by
ModuleInfoBundleTracker.

Change-Id: I20213ce8bd1dfc5109f3ef223cec8048bec92e12
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix DistributedEntityOwnershipIntegrationTest failure

Fixed intermittent failure due the follower2MockListener getting
an ownershipChanged with "false, false, true" if if the original
ownership change with "member-2 is replicated to follower2 after
the listener is registered. The test ran 100 times successfully.

Change-Id: I1f0333e3bc69cc28521bc7388d64b56d18b55544
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit de587f935016a300cdbeb85926c2eb677f383fc2)

Bug 4105: Fix intermittent failure in DistributedEntityOwnershipIntegrationTest

I saw a test filure on jenkins. After follower2 is stopped there will be
2 onOwnershipChange calls so the test needs to expect both.

Change-Id: I74dc583c2d40e966197315640eb189702fbabd64
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit a3a0417b0ee75f040fb4436602ed7ecf5585d44f)

Fix NPE in AbstractFeatureWrapper

Added check in ChildAwareFeatureWrapper#getChildFeatures to verify the
feature exists in the FeaturesService to avoid NPE. This is a similar
workaround as was done in FeatureConfigPusher for a bug in karaf where
the FeaturesService may mysteriously return null for an existing feature.

Change-Id: I006cd012e919ac206d70bb4ee5754c72f0f01b32
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit e6076f69f57fe5918c66637414175c6229635841)

Initial code for RaftActorServerConfigurationSupport

Added a RaftActorServerConfigurationSupport and unit test class with
mostly initial skeleton code. In RaftActorServerConfigurationSupport,
I implemented the basic checks for leader avaialbility with
corresponding unit tests. If not the leader and there is a leader, it
forwards to the remote leader. If no leader, it returns NO_LEADER
failure.

Also in RaftActorServerConfigurationSupport, I added code for the first
steps: add the serverId/address into the RaftActorContext peer map and
add a FollowerLogInformation entry in the AbstractLeader. I added an
initialized field wih getters/setters to FollowerLogInformation. The
entry is added with initialized set to false. I also changed the
followerToLogMap in AbstractLeader to mmutable.

I also modified FollowerLogInformationImpl so it returns false for
isFollowerActive and isOkToReplicate if initialized is false. The idea
is to prevent the leader from sending log entries or a snapshot via
the heartbeat or replication. The leader will send an empty
AppendEntries
heartbeat which should be fine. The RaftActorServerConfigurationSupport
will initiate the install snapshot directly.

I added TODO comments in RaftActorServerConfigurationSupport and the
unit test class which outline the remaining work.

I also added the ServerConfigurationPayload class to be used for the log
entries.

Change-Id: Ic11ddc99a57edb7ef70c2d4f5fa7906d6a95b35e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Return throwable in NeverReconnectStrategy

NeverReconnectStrategy returns empty throwable instead of
passed throwabled with the failed previous
connection reasons.

Change-Id: I5695af09379f06a66c37ccf27293ff85657afeaa
Signed-off-by: Claudio D. Gasparini <cgaspari@cisco.com>

Add PeerAddressResolver for raft actor

For upcoming work to dynamically add peers, the peer address may not be
known and, if the cluster MemberUp has already occurred, no
UpdatePeerAddress message will be sent. We need to be bale to
dynamically resolve peer addresses so I added a new PeerAddressResolver
interface whose instance is obtained from the ConfigParams and used by
the RaftActorContextImpl..

Change-Id: I38807b4b6a59a7cb1359d85a9550cd6e98cb13a4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-4367: Use SchemaSourceProvider to retrieve sources for yang

- Not using schema context to provide the sources anymore.
- Transform the modules into capabilities in YangStoreService instead
of requiring the listeners to do so

Change-Id: I39a144c7472f7944cca01eeff273058aa2fe7d7a
Signed-off-by: Maros Marsalek <mmarsale@cisco.com>

Fixed DatastoreContext not found.

Change-Id: Iec5807fe0e9fb270a87095fe036a2a285d564642
Signed-off-by: Tony Tkacik <ttkacik@cisco.com>

Speed up GlobalBundleScanningSchemaServiceImpl close

On close, the GlobalBundleScanningSchemaServiceImpl closes the
BundleTracker which untracks all the bundles and notifies the listener
of removed bundles. This results in a call to tryToUpdateSchemaContext
which causes the remaining yang files to be re-parsed to build a new
SchemaContext. To prevent this extra processing on shutdown, I added
a "stopping" flag to elide tryToUpdateSchemaContext the same we do
with the "starting" flag.

Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Change-Id: I9f7c05277df9bf1ffaec1c699453020312aab203

Revert ModuleInfoBundleTracker patches

Reverted patch https://git.opendaylight.org/gerrit/#/c/27138/ as it
causes some feature tests to take a long time due to the busy wait.
Also it appears the ModuleFactory OSGi services are needed as the
BlankTransactionServiceTracker listens for them (I'm not clear what this
does). I'll try to figure out another way to accomplish the intent
of the reverted patch.

Change-Id: Ifc91dada86ac7feee1a0a9390a55e68d7f113153
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix ModuleFactoryBundleTracker shutdown hang

If karaf is shutdown quickly after starting, the
ModuleFactoryBundleTracker#getModuleFactoryEntries method may
hang for a while trying to obtain BundleContexts. This has
been seen on jenkins with some feature tests. The ModuleFactoryBundleTracker
does a 1 min busy wait trying to obtain the BundleContext. This was done b/c the
tracker listens for RESOLVED bundles and the BundleContext isn't available
until after the bundle is started. So the busy wait was intended for startup
when bundles transition from RESOLVED -> STARTING. Once obtained, the
BundleContext is cached.

This works fine normally when all bundles start up. However, if stopped
quickly, some bundles may not have started, ie they remain in the
RESOLVED state with null BundleContext, so on shutdown when someone
calls getModuleFactoryEntries, it will busy wait and eventually timeout which
can take minutes.

What we need to do is remove the ModuleFactory entries when bundles are
stopped. The ModuleFactoryBundleTracker#removedBundle method does this
but it wasn't called on shutdown b/c it tracks RESOLVED, STARTING, ACTIVE
and STOPPING states. The solution is to remove STOPPING from the tracked states
so removedBundle will get called on transition ACTIVE -> STOPPING.
However, when transition STOPPING -> RESOLVED occurs the bundle will get
added back to the tracker and we don't want to re-add the ModuleFactory
entries. To prevent this it checks for BundleEvent type STOPPED.

Change-Id: I82889a682809d4217dc4253eb60c922209ad7242
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Modify ModuleInfoBundleTracker to track RESOLVED bundles

I've seen issues where a yang-generated class that exists in another
bundle isn't found on startup when a module config is pushed.
Specifically I've seen it when registering RPC implementations. The
BindingDOMRpcProviderServiceAdapter uses the
BindingToNormalizedNodeCodec get the RPC schema and it can fail to
get the RPC input class.

The BindingToNormalizedNodeCodec calls the BindingRuntimeContext to
obtain schema classes which in turn uses the ClassLoadingStrategy OSGi
service to load classes. The backing implementation of
ClassLoadingStrategy is the ModuleInfoBackedContext supplied by the
config manager bundle. This is backed by the ModuleInfoBundleTracker
which scrapes yang files from bundles. However it listens for ACTIVE
bundles. A while ago the GlobalBundleScanningSchemaServiceImpl was
(correctly) changed to listen for RESOLVED bundles which fixed startup
timing issues. It makes sense to also change the
ModuleInfoBundleTracker to listen for RESOLVED bundles so all existing
yang models are loaded on startup prior to use.

The ModuleFactoryBundleTracker piggy-backs the ModuleInfoBundleTracker
to load ModuleFactory instances needed by the config system. It
registers ModuleFactory instances as OSGi services, which are consumed by the
BundleContextBackedModuleFactoriesResolver, however this fails for a
RESOLVED bundle b/c it doesn't have a BundleContext yet (apparently this
is set when the bundle is started/activated). To fix this, I refactored
BundleContextBackedModuleFactoriesResolver and
ModuleFactoryBundleTracker a bit. The ModuleFactoryBundleTracker no
longer registers ModuleFactory instances as OSGi services. Instead it
maintains a list of ModuleFactory/Bundle entries which the
BundleContextBackedModuleFactoriesResolver directly uses to build the
resulting factories map for the ConfigRegistry. A bundle may still not
have a BundleContext at that point so, to safe guard against that, I
added a busy wait for BundleContext.

Change-Id: Ia7bd39f635e3473e6e84011163a0768865c9a931
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add election info to Snapshot

When a snaphot is saved we delete all prior applied data entries from
the journal. However this also has the side-effect of also deleting prior
UpdateElectionTerm entries so, on restart, we lose the election term
info. We need to persist the election term wih the Snapshot.

Change-Id: I0ed140de1868cc03a28cfbc1d6eb909fe4dbc252
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add Raft AddServer message and reply

Change-Id: I59499ab0f0b7c202a309af0412a0c0ae38494d8b
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add getOwnershipState method to EntityOwnershipService

Added a new method to gte the current ownership state for an entity.
This was requested for OF clustering.

The DistributedEntityOwnershipService obtains the EntityOwnershipShard's
DataTree via a new message GetShardDataTree and reads the entity's owner
leaf in order to build the resulting EntityOwnershipState. The DataTree
is obtained once and cached.

Change-Id: Ib4aa2f4e5370d8d5183908b836417936a51458f7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit dd6976c24f12c7cef7bed8fa6bc645dc699dda4f)

YangInstanceIdentifier::toInstance() -> build()

::toInstance() is gone, which breaks the build.

Change-Id: I51662dd351bf5b02441f58291e85e0e1729d7785
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Always persist and recover election term info

With data persistence disabled, this also disabled persistence/recovery
of election term info. This was an oversight - we need to persist and
recover election term info regardless.

Change-Id: I48d33ca5d3b7d95e2aeb8ed7f9c8d5f1aa401ece
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Revert "Remove obsolete artifacts from commons.opendaylight"

This reverts commit 0ff87783a0fb2ab7bd60daf4a399bb8933af244a.

Change-Id: Id9151032657453c816b3dab63bb8982e4b2e8030
Signed-off-by: gvrangan <venkatrangang@hcl.com>

Removed properties from parent pom.

Change-Id: I0b5836369b8be33c88abd491a644364d9c92be55
Signed-off-by: Tony Tkacik <ttkacik@cisco.com>