git.opendaylight Code Review - controller.git/log

Refactor MockConfiguration to extend ConfigurationImpl

MockConfiguration is now essentially a wrapper for
ModuleShardConfigProvider whose source is a shard name -> members map.
This will make it easier when adding new methods to Configuration
plus unit tests will now use the producton ConfigurationImpl as this
class is simple enough where we don't really need the functionality mocked.

Change-Id: I88e520b275a658a6d718442ad31c1f1e3603c70c
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 2187: Remove ShardManager mbean replica operations

Remove the add/remove shard relica mbean operations as it was decided to
use RPCs instead.

Change-Id: I419a1ec57dfaa9b1d8d55aae5a995d8050b43d70
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Prevent partial init in DatastoreSnapshotRestore

The config subsystem should only push one config
at a time, but in case it doesn't, synchronize
DatastoreSnapshotRestore.initialize() to prevent
partial initialization in the event of concurrent
calls to getAndRemove().

Change-Id: Ie614e8b2045d86ea46b55609bf5cde9e6597b086
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

Bug 2187: Implement add-shard-replica RPC

The unit test creates 3 actor systems each with their own datastores.
Now that the ShardManager persists shard info and due to the static
nature of the InMemorySnapshotStore, each ShardManager needs to have a
unique persistenceId otherwise the equivalent ShardManager's persistence
Ids will clash. Therefore I added a shardManagerPersistenceId field to
the DatastoreContext so the unit test can provide a uniique Id based on
member name.

Change-Id: I907cd568d64f43586ffc1ec8581e4208f46db327
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Clean up plugin management

A number of plugins are managed by odlparent, so remove unnecessary
entries (i.e. specified identically in odlparent).

Remove all references to ${exam.version} (the dependencies are
inherited from odlparent).

Change-Id: I43ac4a692b7911321b448e788536d58f916657d1
Signed-off-by: Stephen Kitt <skitt@redhat.com>

BUG 2817 - Basic implementation of RemoveServer in the Raft code

When a RemoveServer is received it may ask for the removal of
a the current leader or one of the followers. As a first pass
we do not support removal of the current leader. To correctly
implement removal of the leader we would have to implement
leader transition which I intend to build in a future patch.

When a follower is removed the server configuration is changed
immediately on the leader and the new configuration persisted
to the journal. When other followers receive the removed
journal entry they would also remove the server from their
configuration, this is the same as what was done for the
AddServer implementation.

As soon as then new configuration is persisted we respond with
success to the caller. This is the same as for AddServer.

When the ServerConfiguration is complete we send a ServerRemoved
message to the follower which has been removed.

Change-Id: I2b85d82cbeef13cca830e3cc212aebbbcd95c818
Signed-off-by: Moiz Raja <moraja@cisco.com>

Remove unused ShardCommitCoordinator#CohortEntry constructor

Change-Id: I43b478bd6b5467cc46a65c97a5888ce0ec5ded5c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Fix failure of testCloseCandidateRegistrationInQuickSuccession

Moved checking of whether the ownershipchange event occurred with
hasOwner=false to the loop so that we pass the test only when all
listeners receive that event with hasOwner=false

Change-Id: I463272822e6a39f310fef5996b541e1d06c79548
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 2187: Don't close over internal state in ShardManager

For AddShardReplica, we use the ask pattern for the FindPrimary and
AddServer messages. However in the OnComplete callbacks we're closing
over internal state which isn't safe since the callback will be notified
outside of the actor's execution context which may result in concurrent
mutation of internal state. Therefore I added internal messages that are
sent to self in the callbacks.

Change-Id: I1f6662a4e473749925046f127cad868e54b761a2
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 3231 jolokia access should be controlled by aaa

Due to unfortunate lack of support, we are going to have to just use
basic authentication from config file for now.  I have committed this
patch to upstream jolokia:
https://github.com/rhuss/jolokia/pull/225
which will unlock power for us to use AAA.  However, this won't be
available until a new release is cut on Jolokia's end.

The only options for jolokia-osgi bundle for authentication are basic
file authn (which is implemented in this proposed changeset) and JAAS.
ODL's JAAS is unencrypted and generally disregarded, so basic file
authN was chosen.  By default, the credentials are admin/admin.

Change-Id: I35770bcf13b3cb32e59685e9bbf0ef47d73d132f
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Bug 2187: Bootstrap EOS shard when no local shards configured

The intended workflow to initially form a cluster dynamically is to
change the role for a second node to say member-2. Since the initial
static shard config is bootstrapped to member-1, no local shards will be
created. However, the entity-ownership shard is special in that it is
intended to exist on every node.

The EOS will be boostrapped as follows:

For the EOS CreateShard message, all unique members for all shards are
obtained from the static shard config. It assumes the local member is
present in the config however in the above workflow it won't be. So on EOS
CreateShard, if the local member isn’t in the initial member list then
it will create the local shard with an empty peer list and
DisableElectionsRaftPolicy so it stays as follower. Also the shard will
be flagged as inactive in the ShardManager. A subsequent
AddShardReplica will be needed to make it active.

The other option is to not create EOS shard but there may be initial
candidate registrations which would be missed unless we add retry logic
in the service class. But the EOS shard already has retry logic so it
would be ideal to leverage it.

I also made changes to the AddShardReplica logic to handle an existing
local shard as will occur for the EOS shard:

- remove the failure reply if local shard already exists
- if the local shard exists and the primary shard is the local shard,
   do nothing and return AlreadyExistsException failure reply
- otherwise send AddServer to the primary
- on FindPrimary, if the local shard exists but is not active, do a
   remote find as if the local shard doesn't exist
- on AddServer, if the new server is already in the peer list, the
   ALREADY_EXISTS status is returned. Return AlreadyExistsException
   failure reply
- on AddServer failure, if the local shard was pre-existing don't
   remove it.

We still want to prevent an AddShardReplica request which one is already
in progress so I added a Set to track this.

I added an integration test for bootstrapping the EOS shard. It starts
with an inactive shard and registers a candidate, which gets queued
since there's no leader. It then issues AddShardReplica and verifies the
candidate gets registered with the leader.

To get this to work required some teaks in the RaftActor and Follower.
When the ShardManager clears the DisableElectionsRaftPolicy, the
RaftActor creates a new Follower instance however it loses the previous
leader Id. If the new server config hasn't been replicated yet then it
has no peers and immediately tries to start an election. Since it has no
peers it goes to Leader wih no followers creating a 2 leader situation.
To alleviate this I transferred the previous leader Id to the new
Follower instance to prevent the immediate election.

Eeven after that the test still didn't work b/c the leader was still not
in the EOS shard peer list so lookup of the leader address returned
null. So I changed getPeerAddress in the RaftActorContext to lookup in
the resolver if no peer info exists.

I also added more units for AddShardReplica to increase code coverage.

Change-Id: Id2a12ae226af69611d5ca5155f5f018cef82dff4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug-4636: NotificationSubscriber's exception prevents notifications to other listeners

Catch the exception and log it with enough context.

Change-Id: I23c248c59753008e6d09155513b2dba108fbccbf
Signed-off-by: Kamal Rameshan <kramesha@cisco.com>
Signed-off-by: Robert Varga <rovarga@cisco.com>

Specify dsbenchmark's parent POM relativePath

This is required to build controller with no pre-existing controller
artifacts.

Change-Id: I7fe9f6ae015a75ddaa5d53dcdd770a214b4322bb
Signed-off-by: Stephen Kitt <skitt@redhat.com>

BUG 2187 - Persisting shard list in ShardManager

In ShardManager, the local shard list is persisted as a snapshot.
On recovery, persisted shard list is used to create the shards.
During recovery, obtained persisted information is updated to the
configuration so that it is uniformly available to the DatastoreContext.

Incorporated the comments

Also, as localShards are now created after RecoveryCompletion, the
shardManager mbean is associated with the shardManager immediately
after creation. On creating the localShards, the shards addition
is notified to the mbean object.
In the shardManagerTests involving verification of the syncStatus
and CountDownLatch objects, the testcases are made to wait for
localShard creation by waiting for recoveryCompletion message.

Change-Id: I523ed9b14af4b1b6e272f05faac1cf37abfef336
Signed-off-by: kalaiselvik <Kalaiselvi_K@Dell.com>

Remove unused ShardCommitCoordinator constructor parameter

Change-Id: I1c25a18e6f4ed700547f7cc9931d5a44d31c7b93
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 4564: Implement datastore restore from backup file

Added a singleton DatastoreSnapshotRestore class that looks for and
reads a restore file in a specific directory and deserializes the datastore
snapshots. The restore file is then deleted.

The DatastoreSnapshotRestore instance needs to be injected into both
DistributedDatastore instances which are created via separate config
system Module instances. However the only way to inject the
DatastoreSnapshotRestore instance would be to define a yang module
and service. I didn't want to go thru the overhead of all that and I
didn't want the DatastoreSnapshotRestore advertised as a service. So I made
it a static singleton that is created via a new bundle Activator class.

The DatastoreSnapshot instance is passed to the ShardManager which
passes each ShardSnapshot to the corresponding Shard actor. On
recovery complete, the RaftActor takes care of applying the restored
snapshot.

Change-Id: Ied3db4e49b98320abb34e2acf73b27b29232f8d6
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-865: specify DataTreeType explicitly

This removes the use of the compatibility create() method and specifies
the requested OPERATIONAL data tree explicitly.

Change-Id: Ib0f84202357cd413b43035450af1ecef0898a0ad
Signed-off-by: Robert Varga <rovarga@cisco.com>

Fix resource leaks in test cases

Close AutoCloseable objects created in
test cases that were not being closed.
Add mock calls for close() methods that
now need to be stubbed.

Change-Id: Iab057a3a1850d024f02656eb1ae82c6fb1486030
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

Bug 2187: Persisting Actor peerIds' in snapshot

Persisting Raft Actor's peer information in a snapshot and recovering the same
from the snapshot.
Incorporated the comments.

Change-Id: I12831f129b2bdeb1c64f473e94be617f8d6ee487
Signed-off-by: kalaiselvik <Kalaiselvi_K@Dell.com>

Make methods static

Private methods which do not touch object state can be made static.

Change-Id: I4f5a7e6215c7570660ee797f4e694745844f72e7
Signed-off-by: Robert Varga <rovarga@cisco.com>

Bug 4564: Implement restore from snapshot in RaftActor

The restore snapshot is supplied by the derived actor's
RaftActorRecoveryCohort. If one exists the the RaftActorRecoverySupport
desrializes and applies the snapshot.

I also add a Builder to MockRaftActor to make it easier to pass
additional params.

Change-Id: Ib52b24331038ed48221cc27086fa3cceafe39fcf
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 4554 : Ownership is not cleared when all candidates are removed

When all candidates for an entity get unregistered at approximately
the same time it can create a situation where the owner for the
entity is not cleared. Consequently no entity ownership change is
raised where hasOwner is false even when there are no owners for
the entity.

This could be a problem for applications which do
some action when there are no candidates for an entity. The
openflow application for example relies on the disappearance of
all owners to actually remove a switch from inventory. Without
this event we have the situation that nodes hang around in inventory.

Problem Sequence
----------------

The sequence of events which leads to this problem are as follows.

Let's say member-1 owned entity-1 and there are 3 candidates for
entity-1 - member-1, member-2 and member-3. Now let's say due to
some event all candidates have to unregister. The data
transaformations will go like this.

delete member-1
delete member-2
delete member-3
delete member-1 succeeds so choose new owner - in this case member-2
make-owner member-2
delete member-2 succeeds - member-2 is not the current owner so do nothing
delete member-3 succeeds - member-3 is not the current owner so do nothing
make-owner member-2 succeeds. Now we have an owner for entity-1 even though we have not candidates

Solution
--------

The solution proposed in this patch is to set member to empty when
there are no remaining candidates. This changes the above sequence as follows.

delete member-1
delete member-2
delete member-3
delete member-1 succeeds so choose new owner - in this case member-2
make-owner member-2
delete member-2 succeeds - member-2 is not the current owner so do nothing
delete member-3 succeeds - member-3 is the last candidate so set member to ""
make-owner ""
make-owner member-2 succeeds. Now we have an owner for entity-1 even though we have not candidates
make-owner "" succeeds. Now we have owner for entity-1 set to no one as it should be

Change-Id: I583e8c6991742ada5846e87da35db255eeed144e
Signed-off-by: Moiz Raja <moraja@cisco.com>

BUG 4615 : Add method on EOS to check if a candidate is registered locally

Change-Id: Iedb2e4cf92553910cf5e1bd85978f88e10bf3c25
Signed-off-by: Moiz Raja <moraja@cisco.com>

Implement LeastLoadedCandidateSelectionStrategy

Change-Id: I09035505bcfa0ef5b2ac357217186ad98db7974c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Maintain EntityOwnershipStatistics

Implementing a LoadBalancing entity owner selection
strategy depends on our ability to find the load on
specific candidates. The EntityOwnershipStatistics collects
this information and provides query methods to access
ownership counts for candidates.

Change-Id: I7e812b15e8fb21e3be1aed10384600b9acb8bf20
Signed-off-by: Moiz Raja <moraja@cisco.com>

Add a mechanism to read the entity owner selection strategies from a config file

Change-Id: Ie951e4f83aaf38f00e959f4243820a88cb988788
Signed-off-by: Moiz Raja <moraja@cisco.com>
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Pass in EntityOwnerSelectionStrategyConfig when constructing DistributedEntityOwnershipService

Change-Id: Iad1014db726a06de9a89a9987216ca4c96981122
Signed-off-by: Moiz Raja <moraja@cisco.com>

Pass in EntityOwnerSelectionStrategyConfig when constructing EntityOwnershipShard

Change-Id: I56c2f4f87c61e81b662cd0b30c60775389e9b9a3
Signed-off-by: Moiz Raja <moraja@cisco.com>

Allow passing of delay to the EntityOwnerElectionStrategy

Change-Id: If745443585e68a26c10622a7888ec52dbee0059c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Add Delayed Owner selection base on strategy

Change-Id: I04fc216ffc7e5c3fd35b34b6d03a5030c359d77f
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 2187: AddServer: check if already exists

On AddServer, if the new server already exists as a peer return
ALREADY_EXISTS status reply.

Change-Id: I3b324850e1f05fce72eced3b2ced52f1510973fe
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 2187: Increases test coverage in RaftActorRecoverySupport

This is a follow-up patch to
https://git.opendaylight.org/gerrit/#/c/29112/ to add more unit test
coverage.

Change-Id: I1dcd87c9bed55b75eed03e7736b0165f656f661f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 2187: Return OK reply after AddServer persist

The AddServer processing was changed to return OK reply as soon as the
new ServerConfigurationPayload is persisted without waiting for
consensus. Prior, since the new server config is applied immediately in
the leader, if consensus wasn't reached, this would cause the
ShardManager on the calling side to delete new follower actor, resulting
in a "zombie" peer in the leader. Even if consensus isn't reached, the
new server config would've at least most likely been replicated to the
new follower and other down followers would eventually be replicated
when they come back up.

Change-Id: I425fa78d5dd023feda7913ed8d1b5b6c285ccae4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 4589 : Handle writing and reading large strings

Change-Id: If81926757aef3c1275ba43a7cf8c7adf94d86e08
Signed-off-by: Moiz Raja <moraja@cisco.com>
(cherry picked from commit 28484d59aa626dd4b32cdeb2d10dbc2c47cc051a)

Bug 4564: Add Shard Builder class

Added a Builder class to Shard to replace the props and Creator
classes to make it easier to pass new params to Shard w/o having
to change a lot of code and unit tests. An upcoming patch will add
a new param.

Change-Id: I122747d0cc6c14f090026efe81425e1e1e4edc37
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Remove duplicate junit dependency

odlparent is already declaring the scope as test, no need to repeat
that. Fixes warnings in autorelease.

Change-Id: Ia0b6550d2ecbce80eefa168d78c8b50e29100698
Signed-off-by: Robert Varga <rovarga@cisco.com>

Introduce EntityOwnerSelectionStrategy

Currently the EntityOwnershipService does not do any load
balancing, in that it allows the first candidate that registers
to become an owner. There is a need to do that so that applications
which choose to do some *work* based on if it owns an entity can
scale better.

This patch introduces the concept of an EntityOwnerSelectionStrategy
with the intent to provide custom strategies later to choose an owner.

Since custom strategies require intimate knowledge of how the
EntityOwnershipShard chooses a leader at this time I do not think
a strategy can be passed to the EntityOwnershipService via API. The
intent therefor is to choose a strategy based on configuration
wherein a custom strategy can be chosen for each entity type. If
the Strategy needs any custom configuration then it can have configuration
files of it's own

Change-Id: Ia53b8edb59fb1d06a426d9d9a95c07ef4ae65cd1
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 2187: Recover Peer Id's and Update peer map during Journal recovery

Recover ServerConfigurationPayload ReplicatedLogEntry's and immediately apply to the peer map in RaftActorContext.
Review Comments incoporated.

Change-Id: I1b1b3c21e83eb5ea799dd040a4da8f78f1155082
Signed-off-by: Rajesh_Sindagi <Rajesh_Sindagi@dell.com>

Clean up duplicate/unused dependencies and properties

Remove dependencies and properties provided in odlparent (with the
same versions).

org.json.version in features/mdsal/pom.xml is unused.

A few properties are only used once, in controller, so replace them
with the version in-place.

(All this will allow a number of properties to be removed from
odlparent.)

Change-Id: I07e9f2298ebd008d82b22b156dc2ddce50151641
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Cache config QNameModules

Use pre-instantiated and cached QNames, so we do not end up wasting
space unnecessarily.

Change-Id: I7ff7b9a098fbf182770d07ccbd0b9bb60334fb82
Signed-off-by: Robert Varga <rovarga@cisco.com>

BUG-4556: lazy computation of MXBean maps

Further analysis of our feature:install CPU usage shows that we spend
inordinate amount of time constructing MXBean maps. Make the
construction more asynchronous.

Change-Id: I69450bfe8debb65160c40aed6a75ff3d3bef831d
Signed-off-by: Robert Varga <rovarga@cisco.com>

Set odlparent-lite as parent for benchmark/pom.xml

Change-Id: I80e8c621a909fd4dde0a7d25d887ea4523451ce6
Signed-off-by: Vratko Polak <vrpolak@cisco.com>

Bug 4560: Improve config system logging for debuggability

Manually cherry-picked from
https://git.opendaylight.org/gerrit/#/c/28985 as the files have moved in
master.

Also the code has changed slightly in master, specifically the
ConfigPusherImplTest no longer uses a Thread uncaught exception handler
for verification. However it does rely on exceptions thrown from the
ConfigPusherImpl so, to keep the same behavior, I added a
propagateExceptions flag to ConfigPusherImpl#process. The
ConfigPersisterActivator production code passes false so unchecked
exceptions aren't handled as uncaught exceptions.

Change-Id: Iabc22030abc22cf11a1476986ba3d3366021b4fb
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Set odlparent-lite as artifacts parent

Change-Id: I4ae4994db55739460ca5d326865d7e704a2b8e26
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Bug 4564: Implement clustering backup-datastore RPC

Added a new RPC backup-datastore to send the GetSnapshot message to the
ShardManager's and persist the list of DatastoreSnapshots to a file.

I also renamed the cluster-config yang module to cluster-admin to make
it more general as the backup RPC isn't related to configuration.

Change-Id: I18e5d47f7052b890c3547066145e4d5d0fbe1277
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4564: Implement GetSnapshot message in ShardManager

Added a serializable DatastoreSnapshot class that stores the serialized
snapshot for each shard.

On GetSnapshot, the ShardManager sends a GetSnapshot message to each
shard and creates a ShardManagerGetSnapshotReplyActor to compile the
replies and return a DatastoreSnapshot instance to the caller.

Change-Id: I11f872aa701f1e51de9cbccdc1a372a76bc45cff
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4564: Implement GetSnapshot message in RaftActor

Added a new client message, GetSnapshot, to return a serialized Snapshot
instance. The implementation just captures the snapshot for return and does
not persist it. If data persistence isn't enabled, it does not initiate a
capture and returns a serialized Snapshot instance containing just the
persistable state, eg election term info.

Change-Id: I9ea7fc8e0e60c4d6874f5eb0188543e1d9b51243
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4149: Implement per-shard DatastoreContext settings

Added the ability to specify shard-specific settings in the .cfg file by
prefixing the shard name to the property name, similar to what we allow
at the datastore level.

I added a DatastoreContextFactory that has methods to get the base
DatastoreContext and a per-shard DatastoreContext. The
DatastoreContextFactory is now passed to the ShardManager instead of the
DatastoreContext. The DatastoreContextFactory uses the
DatastoreContextIntrospector to overlay per-shard settings onto the
base DatastoreContext.

Change-Id: I329c98c1577a74ebe665052f76e28da3867e2e86
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Added the data store benchmark (dsbenchmark, Bug 4519, https://bugs.opendaylight.org/show_bug.cgi?id=4519)

Change-Id: Ibc6d214b43b6353adbc49ba7b5b4a302ae1fbd95
Signed-off-by: Jan Medved <jmedved@cisco.com>

Speed up YangStoreService

Change-Id: Ibaf972650045b5d85be155f653f7eef36aae6c6e
Signed-off-by: Robert Varga <rovarga@cisco.com>

Bug 4563: Increase akka seed-node-timeout

Change-Id: I8f17872ef30a96d58a666e3499cf42ab59f0491d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix precondition string

The string has been corrupted, fix it up.

Change-Id: I36312ca4e5ca6365b3003a2ad57ca2734d156578
Signed-off-by: Robert Varga <rovarga@cisco.com>

Improve YangStoreService performance

Simple changes to eliminate synthetic methods and unneeded duplication
of collections.

Change-Id: I370d4ed85720e2b7eb811204afa9f532b716b16d
Signed-off-by: Robert Varga <rovarga@cisco.com>

Do not subclass Hashtable

Rather than subclassing, instantiate a Hashtable and fill it.

Change-Id: Icfd4e812759874a702a2506e9090cd20535bdc50
Signed-off-by: Robert Varga <rovarga@cisco.com>

BUG 3973: Add config option for Java-only leveldb

Add comment in akka.conf on how to use the Java-only
version of leveldb for platforms where native leveldb
is unavailable.

Change-Id: I5693522597152ef7f86bb89d4be32e20f0582806
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

Add leader unit test for non-voting consensus

Added a test case to LeaderTest to verify a non-voting follower
does not influence replication consensus.

Also I saw intermitent test failures (in jenkins as well during first
verify build) due to a message going to dead letters shortly after
actor creation (also reported in Bug 4223). Specifically it was occurring
when the leader sent the initial AppendEntries heartbeat to a follower. This
seems like a timing issue/bug in akka when using an ActorSelection. I
added code in the TestActorFactory to use an actorSelection and call
resolveOne in a retry loop. This seems to alleviate the issue as I ran
LeaderTest over 1000 times successfully.

Change-Id: I65cb87f419c280befe2d82300a981bd8e6f88742
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 2187: Address comments in https://git.opendaylight.org/gerrit/#/c/28596/

Addressed minor comments in https://git.opendaylight.org/gerrit/#/c/28596/.

Unified the response messages and debug messages.

Added persistenceId() format param to the debug messages for additional
context.

Change-Id: Ic1a4e852126425cf7ae67ee5b9ea301b06a3f9a8
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Always persist ServerConfigurationPayload log entries

We need to always persist ServerConfigurationPayload log entries
regardless of whether or not persistence is enabled for the derived
RaftActor's data.

I added a new tagging interface PersistentPayload, implemented by
ServerConfigurationPayload, to indicate a Payload
needs to always be persisted. Since log entries are persisted by
both the RaftActor and Follower behavior via the ReplicatedLog, the
logic to determine persistence based on PersistentPayload needs to be
available to both. The ReplicatedLog uses the persistence provider
contained in the RaftActorContext which is the
DelegatingPersistentDataProvider set by the RaftActor. So to keep
the rest of the code the same and keep it simple, I derived a
RaftActorDelegatingPersistentDataProvider which overrides persist to
handle the PersistentPayload logic utilizing the RaftActor's
existing PersistentDataProvider.

Change-Id: I243026b28ed57461ad92324b6947091ae74a7127
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Derive MockRaftActorContext from RaftActorContextImpl

I changed MockRaftActorContext to derive from RaftActorContextImpl since
it duplicates most of the functionality in RaftActorContextImpl and,
with the addition of PeerInfo, MockRaftActorContext can now provide the
same functionality as RaftActorContextImpl w/o having to duplicate it in
MockRaftActorContext. Also this will make it easier when the RaftActorContext
interface is changed.

Change-Id: Ief90232fc992a50b3f0fea5ece323a14916760f2
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-3381: Capture Snapshot on recovery if journal is not empty

Change-Id: Ib1068cb6d4848d151039887b51458399ff421178
Signed-off-by: evvy <dhiraviam.natarajan@gmail.com>

Add wait state for AddServer if snapshot in progress

It is possible a snapshot capture coild be in progress when we
attempt to initiate snapshot capture on AddServer. I added a wait
state to the FSM and a new message, SnapshotComplete, that is sent
by the SnapshotManager.

Added more unit test cases.

Change-Id: I119a264e03686ea70f7834e551c2fb45dd39f903
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 2187 - Creating ShardReplica

Creating local shard replica with a custom Raftpolicy. Informs Shard leader of the local shard.
Processes AddServerReply from shard leader.
On successful replication, makes local shard voting capable.
On replication failure, local shard is removed.

Incorporated the comments

Change-Id: Id2b90039c39211b20322bc2d141520723d44c391
Signed-off-by: kalaiselvik <Kalaiselvi_K@Dell.com>

BUG-2187: Non voting and Uninitialized followers are not to be counted towards consensus

Change-Id: I1ba86cf2e2f904847ea8f819e84a3dc54fcc31d2
Signed-off-by: Rajesh_Sindagi <Rajesh_Sindagi@dell.com>

Add voting state to ServerConfigurationPayload

Changed the internal state to a list of ServerInfo instances which
contain he server id and voting state.

Also removed the oldServerConfig field as it won't be needed.

Change-Id: I10b3ca8dc2ffed9b5db0a7d0f6ca74d73a837b8e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix small bug in startup archetype

Change-Id: I83913ed9f16b38f6e6fd461b76dece1a09f4c8ca
Signed-off-by: Ed Warnicke <hagbard@gmail.com>

BUG-2399: fixup tests

The test model specifies the top-level container as structural, yet the
tests expect it to exist when empty. Mark the container as presence,
restoring behavior expected by tests.

Change-Id: Ided99720468a8bee14d5c66342e524450f5a9050
Signed-off-by: Robert Varga <rovarga@cisco.com>

Introduce PeerInfo and VotingState

We need to store the voting state for each per so I created a
PeerInfo class to include, id, address and voting state (represented by a
VotingState enum). The RaftActorContext now stores PeerInfo instances
in its peer map and added methods to access PeerInfo. As a consequence,
RaftActorContext#getPeerAddresses was no longer needed and was removed.

AbstractLeader and Candidate were modified to utilize the PeerInfo to
calculate the majority vote/min replication count, ie ignore non-voting peers.

Previously we had added a FollowerState enum and stored it in the
FollowerLogInformation. Since voting state is now stored in the
RaftActorContext peer info, I removed the FollowerState from
FollowerLogInformation to avoid redundancy and having to keep both
up to date.

Change-Id: I1394511a8db7f0b9df3ed7879c77c1f44f3b143d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bump Akka to 2.3.14

Change-Id: Ia6bf3f1a4c025ec1e84662c04ccdc40c04e569a2
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

Remove checks for NormalizedNodeBuilderWrapper

Interface contract already guarantees returned objects are subclasses of
NormalizedNodeBuilderWrapper, the instanceof guards only non-nullness.

Switch to explicit assertNotNull() to reduce eclipse warnings.

Change-Id: Ibf0d73752c6e1ebeacbb10677e2f11f185098bd9
Signed-off-by: Robert Varga <rovarga@cisco.com>

Do not use MoreExecutors.sameThreadExecutor()

This method is deprecated, replace it with proper service/executor.

Change-Id: I7257a28f28784313cafc250f2c2fd1c623332dec
Signed-off-by: Robert Varga <rovarga@cisco.com>

Make REUSABLE_*_TL final

Since these are public static fields, they should be final to prevent
possible shenanigans.

Change-Id: I4a360e060ddde57a73118bcf3d053ce397204136
Signed-off-by: Robert Varga <rovarga@cisco.com>

Reduce ShardDataTree#getDataTree() callsites

A lot of these callsites perform a specific function, expose those
functions without leaking the DataTree. This is needed to handle
asynchronous persistence and optimistic transaction commit.

Change-Id: I330cb4172349e0d1d8daacc3aafce7dad64cd8b2
Signed-off-by: Robert Varga <rovarga@cisco.com>

Do not declare unneeded Exception throw

Fixes sonar warnings

Change-Id: I31ab95c75cf30b33c9025d6f6e4662ccc5df7a47
Signed-off-by: Robert Varga <rovarga@cisco.com>

Make private methods static

These methods do not reference object state and therefore can be made
static.

Change-Id: I416e415b90647b4f700b7893fe4f64f479271fab
Signed-off-by: Robert Varga <rovarga@cisco.com>

Add getPeerIds to RaftActorContext

For upcoming to work to add voting status to the peer info in
RaftActorContext, I added a getPeerIds method to replace calls to
getPeerAddresses as virtually all callers really just want the IDs or want
to check the size. getPeerAddresses will (likely) be removed altogether -
this is a preliminary patch.

Change-Id: I2b6f2c36dfec14ccd4bbfef35e67ed86cf3e3e45
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix resource leaks in TransactionChainProxyTest

Close TransactionChainProxy objects (AutoCloseable)
that were not being closed in the test cases.

Change-Id: I85b1f951545b764007bdb2e808a2438c9bd4b2b2
Signed-off-by: Gary Wu <gary.wu1@huawei.com>

update leveldbjni version to support Solaris

Change-Id: I46de5b3cc9c220a70a408194fb3ff709cdff1937
Signed-off-by: rshoaib <rao.shoaib@oracle.com>

Bug 2187: Code cleanup and refactoring

I addressed remaining comments from a prior patch.

I also refactored RaftActorServerConfigurationSupport to use an FSM
similar to the SnapshotManager with some generic classes. This will
make it easier to implement RemoveServer and reuse code.

Change-Id: Id3cdcede3f9c393c878abd3e9a9d3a5e12c5fb8a
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Remove unused Jersey dependencies from the controller

The code utilizing Jersey was moved to the netconf project. This change
removes some of the deprecated dependencies.

Change-Id: I62b944497c976b1251412d8d047ef833e69dfb0a
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>

Remove unnecessary @SuppressWarnings

Change-Id: I2b59e7f29a15298c1135c12b6bd9699205706600
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Fix resource leaks in exception handling

Fix resource leaks when exceptions are
encountered during ConfigManagerActivator.start().

Change-Id: Ic12c756aa5a768add0bc62e71eed94e5b2fa5fea
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Bug 4037: Allow auto-downed node to rejoin cluster

This patch will detect when a node has been
auto-downed/quarantined by another node. When this
happens, the ActorSystem of the datastore will be
restarted to allow the node to rejoin the cluster.

Change-Id: I0913bf455d426b6a0fccb17eac61b74f0911fa5d
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Bug 2187: AddServer unit test and bug fixes

Follow-up patch to https://git.opendaylight.org/gerrit/#/c/28018/.

Got the unit tests working and added more unit tests to cover more code.

Also fixed several bugs in the code that were failing the tests. One bug
was caused by replicating data quickly after install snapshot was
complete. On the final install snapshot chunk the follower sends an
ApplySnaphot message to persist and apply the snapshot. On the reply,
the leader assumes the follower is up-to-date and sets its next index.
However, applying the snapshot, ie updating the log and commit index, is
actually done after the async callback from the snapshot persist. In between
that time, if the leader sends the server config AppendEntries, the follower's
log is still empty and it deems itself out-of-sync and reports back failure.
This will cause the leader to eventually send a new install snaphot
which isn't which is not desirable. Also it may delay consensus for the
server config entry.

To fix this, I delayed the final InstallSnapshotReply until after the
ApplySnapshot is complete. I did this by adding a Callback to the
ApplySnapshot message which the SnapshotManager invokes.

Also the new server config was constructed without the leader's ID - it
needs to contain all members.

Also the ServerConfigurationPayload wasn't being applied in the
followers.

Another issue was that, if the leader had no peers initially, the
heartbeat wasn't scheduled so, when the new server was added, heartbeats
weren't occurring. So I change addFollower to schedule the heartbeat.

I added a test for adding a non-voting server which caused an endless
loop in AbstractLeader#handleAppendEntriesReply where it updates the
commitIndex based on the replicated count. To fix this, I added a break
if the replicatedLogEntry is null.

Change-Id: I5dff351140c611d58357cd58900bed401606038c
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 2187 - JMX API for create/delete shard replica

Change-Id: I48a4dcb7983f5f231e9ddc04e851950abf7c2d8a
Signed-off-by: kalaiselvik <Kalaiselvi_K@Dell.com>

BUG-2187: Add Server - Leader Implementation

Processes addServer request from the follower, forwards the request
to the shard leader, if not the leader.

The follower shard replica data is brought to sync with leader by installing the snapshot from the shard leader.
On sucessful application of snapshot data, this voting but not initialized member is transitioned to voting member.
New server configuration is persisted and replicated to majority of the followers and responds back with OK message to the shard follower.

In case where the leader is unable to sync data to the follower in a configured time period, TIMEOUT message is responded back to the shard follower without adding/persisting the new server configuration.

Change-Id: I9a3870d14bb6ad532ff64f315b2e2000d8b803e2
Signed-off-by: Rajesh_Sindagi <Rajesh_Sindagi@dell.com>

Add cluster config yang RPCs and provider wiring

Added experimental RPCs, including AddShardRelica, with initial empty
implementations that return unsupported.

Change-Id: Ie8587903920760fc4555bc009c81183e8d7740e4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix DistributedDataStoreIntegrationTest failure

Fixed a timing issue with a test just started causing failures pretty
regularly on jenkins builds for some reason.

Change-Id: I40273574376804034fd6f14f56384cb8cae26900
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add missing dependencies

pax-url-aether provides javax.inject and commons-codec, but they need
to be declared separately for correctness and to allow upgrades to
newer versions of pax-url-aether.

com.google.inject.Inject can be replaced by javax.inject.Inject.

Change-Id: I0a1da43faf0345bd71c2737caaa840c396bc60ab
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Fix ModuleFactory not found errors

https://git.opendaylight.org/gerrit/#/c/27874/ improvements to the
config system but had the side-effect of introducing timing issues where
a ModuleFactory wasn't found when trying to push a config. The reason is
that yang schemas load earlier and much quicker than ModuleFactory's,
which are scanned from ACTIVE bundles, so the capabilities may resolve
but a ModuleFactory may not be available yet. As a result, that patch
was partially reverted for the time being.

To fix the missing ModuleFactory issue, I added retries in the
ConfigPusherImpl when a ModuleFactory isn't found, similar to the
ConflictingVersionException retries. The backend now throws a new
checked exception, ModuleFactoryNotFoundException, which is caught to
trigger a retry after a delay. Prior, it threw an
InstanceNotFoundException which was wrapped in an
IllegalArgumentException. I didn't keep the InstanceNotFoundException
b/c it can be thrown for other reasons and I wanted to distinguish
missing ModuleFactoryNotFoundException.

I derived ModuleFactoryNotFoundException from RuntimeException to avoid
having to change signatures in the call chain and thus changing the API.
Prior it threw an unchecked IllegalArgumentException anyway so it's
consistent plus other areas of the code throw unchecked exceptions along
with checked exceptions.

Since the missing ModuleFactory issue is fixed, I re-enabled scanning of
RESOLVED bundles in the ModuleInfoBundleTracker.

Change-Id: I89ff346c0a89afdfa76ce402f2cf3211ac68b5c0
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 4151 : Create a shared actor system

This patch adds an ActorSystemProvider interface in clustering commons
with a method to get a shard ActorSystem instance which uses the
clustered data store configuration as it contains more configuration
options than the rpc connector which pretty much uses stock configuration.
I added a config yang to define an actor-system-provider-service.

I added the ActorSystemProvider implementation and actor-system-provider-impl
config yang in the distributed datastore bundle. I tried it in
sal-clustering-commmons originally but ran into akka errors re: missing
config properties and it also couldn't find the
ReadyLocalTransactionSerializer class. So to avoid chasing down those
errors I put the implementation in sal-distributed-datastore. I think
this makes sense as it is the prime user of the actor system.

I added a dependency for the ActorSystemProvider service in both
datastores modules so the ActorSystem is now injected in and passed
to the DistributedDataStoreFactory. The dependency was also added to the
RPC mpdule.

Elements for the new actor system provider service and impl were added to
the 05-clustering.xml file along with the wiring changes for the data
stores and RPC modules.

Change-Id: I79c14f84c992a2d5ac9c1f1856efbaeba3cc2b77
Signed-off-by: Moiz Raja <moraja@cisco.com>

Fix Eclipse compilation warnings.

Fix compilation warnings in DistributedDataStoreTest
that DistributedDataStores were never closed. Also fix
NPEs on closing DistributedDataStores when the
MXBeans are uninitialized.

Change-Id: I5dcaa389e1e69f934e9016933b00be3adaf4529f
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Remove peer address cache in ShardInformation

The ShardManager caches the peer addresses in the ShardInformation and
uses it mainly to suppress PeerUp, PeerDown and PeerAddressResolved
messages to the shard for peers that don't have replicas for the shard.

This is fine with static config but With the upcoming work to dynamically
add replicas, the shard will take ownership of persisting its peers so
the ShardManager will not know about dynamic peers.

I changed the semantics of the peer addresses to initial peer addresses.
They are now only used to pass to the Shard on creation. As a result,
PeerUp, PeerDown and PeerAddressResolved messages are now always sent to
the Shard for all peers. The Shard/RaftActor decide ll whether or not to
process the peer message. I changed RaftActorContextImpl#setPeerAddress
to ignore a peerId it doesn't know about instead of throwing an ex.

The other usages of the peerAddresses were to lookup the leader address.
This is now done dynamically via the ShardPeerAddressResolver.

Change-Id: Ida9738916a4a85d23198e7c095d5c73f17e2aa6c
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Prepare for Karaf 3.0.4 upgrade

Pull in karaf.version from odlparent.
Drop import of org.apache.felix.service.command (apparently unused).

Change-Id: I6487ce1a52e6f51bbcdd4e332de18d4684782301
Signed-off-by: Stephen Kitt <skitt@redhat.com>

Refactor to fix unchecked cast warnings.

Change-Id: I0fb6ce59707000f225ffa8d654685fbc89f8f2eb
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Fix Eclipse compilation warnings.

Change-Id: I16921743a8cc4ac8902c1b7fffa2edfd8cba8be6
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Fix Eclipse compilation warnings.

Change-Id: I2caddfded34638002b2e31bf4e99d1770dd03a00
Signed-off-by: Gary Wu <Gary.Wu1@huawei.com>

Reproduce bug 4359

Added a couple of unit tests which demonstrates the problem
described in bug 4359 where upon recovery a node which
is previously deleted reappears on reapplying the
candidates

For some reason the problem is reproducible only when the
car is added and deleted twice and not once. I haven't
investigated why yet.

Change-Id: I5f5a656ef6fdc017a3342c8b409576a8b121b7f1
Signed-off-by: Moiz Raja <moraja@cisco.com>

Partial revert of https://git.opendaylight.org/gerrit/#/c/27874/

Patch https://git.opendaylight.org/gerrit/#/c/27874/ made improvements
that significantly sped up config system boot and helped the SFC project
but a couple other projects are seeing a timing issue where a
ModuleFactory isn't found and the config pusher fails. This is due to
the speed up and that YangModuleInfo's are now scraped from RESOLVED bundles
and thus are available quicker but ModuleFactory's are scaped from
ACTIVE bundles.

While the ModuleFactory issue is addressed, I'll partially revert the
prior changes to go back to scanning ACTIVE bundles for YangModuleInfo.

Change-Id: Icd3a51a049a940ad60a4bd0071e3c969167275d3
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add ShardPeerAddressResolver

Added a ShardPeerAddressResolver implementation that is passed to
Shard RaftActors to resolve addresses for shard peer ids. I refactored
ShardManager a bit to move the memberNameToAddress map and related code
to the ShardPeerAddressResolver.

Change-Id: I5cbef5816d9bf13a339e43008144f44fd55fc606
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Modify ModuleInfoBundleTracker to track RESOLVED bundles (round 2)

My first patch https://git.opendaylight.org/gerrit/#/c/27138/ didn't
do well with the feature tests due the BundleContext bust wait so it
was reverted.

I went back to my original solution to confgure the ModuleInfoBundleTracker
to track RESOLVED and ModuleFactoryBundleTracker to track ACTIVE.
Originally when I tried that I had some failure due to the ModuleFactory
not loaded yet but I don't remember exactly what. This patch seems to work
fine - I've restarted karaf several times and also ran the tsdr features tests
several times successfully. Originally I did the first patch in stable/lithium
so maybe something else has changed in master or the way I did it wasn't right.

Since the initial yang module info's are now processed synchronously when the
BundleTracker is opened, I modified the ModuleInfoBundleTracker to
ensure it doesn't propagate runtime ex's. This would disrupt the
BundleTracker and the ConfigManagerActivator - if one module had an
issue the config manager wouldn't start.

For every YangModuleInfo scraped, it registers it with the
ModuleInfoRegistry. The backing impl is RefreshingSCPModuleInfoRegistry
which causes a new SchemaContext to be created from the current yang
models (via updateService). This isn't efficient - on startup, we'll get all
YangModuleInfo's in quick succession so, optimially, it should build the
SchemaContext once after open is complete. This is what the
GlobalBundleScanningSchemaServiceImpl does.

To accomplish this, I removed the call to updateService from
RefreshingSCPModuleInfoRegistry#registerModuleInfo - it is now
specifically called by ModuleInfoBundleTracker. This means the
ModuleInfoBundleTracker now references RefreshingSCPModuleInfoRegistry
instead of the ModuleInfoRegistry interface which makes it less clean.
Any other way would require changes to the ModuleInfoRegistry interface,
which I didn't want to do, or extending the interface which I didn't think that
was worth the effort. The RefreshingSCPModuleInfoRegistry is only used by
ModuleInfoBundleTracker.

Change-Id: I20213ce8bd1dfc5109f3ef223cec8048bec92e12
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>