git.opendaylight Code Review - controller.git/log

Bug 4563: Increase akka seed-node-timeout

Change-Id: I8f17872ef30a96d58a666e3499cf42ab59f0491d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit fe73fc4dab6cbdb63aac8db607e88ce1f62a9100)

BUG-4521 Support milliseconds in event-time notification format

Current format does not support milliseconds right now and netconf
notification RFC clearly says the milliseconds should be supported

Change-Id: Ib91b5f08a0ec78721e2b0984b8dc123f3283d2e1
Signed-off-by: Maros Marsalek <mmarsale@cisco.com>

Bug 4560: Improve config system logging for debuggability

When a config push fails, most of the exceptions that are thrown are
unchecked (like IllegalArgumentEx) and they aren't explicitly caught
so they propagate to the top-level ConfigPersisterActivator thread and
got printed to syserr. So I added a catch and logged to error.

I also added context to the logged message which outputs the xml file
name to aid in debugging issues.

I also added info logging when a config push starts and when it
successfully completes to further aid in debugging issues.

Change-Id: I3db9dcad3cba0abd58c045bc1047e08d6f19ccd3
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

update leveldbjni version to support Solaris

Change-Id: I46de5b3cc9c220a70a408194fb3ff709cdff1937
Signed-off-by: rshoaib <rao.shoaib@oracle.com>
(cherry picked from commit 11c04cc6c57250b575d43aa55402c8a780db1423)

Fix NPE in EntityOwnerSelectionStrategyConfigReader

Configuration#getProperties can return null if the config doesn't exist
- need to check for that. Also added more checks and unit tests.

Change-Id: If468fb8c3df7ecba664bb00a8f01bdfec7b4ceeb
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Implement LeastLoadedCandidateSelectionStrategy

Change-Id: I09035505bcfa0ef5b2ac357217186ad98db7974c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Maintain EntityOwnershipStatistics

Implementing a LoadBalancing entity owner selection
strategy depends on our ability to find the load on
specific candidates. The EntityOwnershipStatistics collects
this information and provides query methods to access
ownership counts for candidates.

Change-Id: I7e812b15e8fb21e3be1aed10384600b9acb8bf20
Signed-off-by: Moiz Raja <moraja@cisco.com>

Add a mechanism to read the entity owner selection strategies from a config file

Change-Id: Ie951e4f83aaf38f00e959f4243820a88cb988788
Signed-off-by: Moiz Raja <moraja@cisco.com>

Pass in EntityOwnerSelectionStrategyConfig when constructing DistributedEntityOwnershipService

Change-Id: Iad1014db726a06de9a89a9987216ca4c96981122
Signed-off-by: Moiz Raja <moraja@cisco.com>

Pass in EntityOwnerSelectionStrategyConfig when constructing EntityOwnershipShard

Change-Id: I56c2f4f87c61e81b662cd0b30c60775389e9b9a3
Signed-off-by: Moiz Raja <moraja@cisco.com>

Allow passing of delay to the EntityOwnerElectionStrategy

Change-Id: If745443585e68a26c10622a7888ec52dbee0059c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Add Delayed Owner selection base on strategy

Change-Id: I04fc216ffc7e5c3fd35b34b6d03a5030c359d77f
Signed-off-by: Moiz Raja <moraja@cisco.com>

Return throwable in NeverReconnectStrategy

NeverReconnectStrategy returns empty throwable instead of
passed throwabled with the failed previous
connection reasons.

Change-Id: I5695af09379f06a66c37ccf27293ff85657afeaa
Signed-off-by: Claudio D. Gasparini <cgaspari@cisco.com>
(cherry picked from commit a90b34b052c46e4d405b1c477a3a5a0f47e1bd98)

Introduce EntityOwnerSelectionStrategy

Currently the EntityOwnershipService does not do any load
balancing, in that it allows the first candidate that registers
to become an owner. There is a need to do that so that applications
which choose to do some *work* based on if it owns an entity can
scale better.

This patch introduces the concept of an EntityOwnerSelectionStrategy
with the intent to provide custom strategies later to choose an owner.

Since custom strategies require intimate knowledge of how the
EntityOwnershipShard chooses a leader at this time I do not think
a strategy can be passed to the EntityOwnershipService via API. The
intent therefor is to choose a strategy based on configuration
wherein a custom strategy can be chosen for each entity type. If
the Strategy needs any custom configuration then it can have configuration
files of it's own

Change-Id: Ia53b8edb59fb1d06a426d9d9a95c07ef4ae65cd1
Signed-off-by: Moiz Raja <moraja@cisco.com>

Fix DistributedEntityOwnershipIntegrationTest failure

Fixed intermittent failure due the follower2MockListener getting
an ownershipChanged with "false, false, true" if if the original
ownership change with "member-2 is replicated to follower2 after
the listener is registered. The test ran 100 times successfully.

Change-Id: Ibe282d138b293980a11ea54cf434a00513f294aa
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Re-enable tests and bump aaa version

This patch is part 2 of 2 patches:

* Increments the version of the depenency on AAA from 0.2.2-Lithium-SR2
* to 0.2.3-SNAPSHOT.
* Re-enables the feature tests for the restconf and netconf-conector
features.

Change-Id: Ia3325503113f177814d126bcdb0a0c5acc54b3e6
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Bumping versions by 0.0.1 after the Lithium SR2 release

This patch is part 1 of 2 patches.

* The only version not incrmented is aaa.version, which is left at
0.2.2-Lithium-SR2 since they depend on controller and can't update yet.
* To break the cyclic dependency, this patch temporarily stops running
the netconf-connector and restconf feature tests.

A second patch (to be run after AAA increments their versions to
0.2.3-SNAPSHOT) wiil update aaa.version and re-enable these tests.

Change-Id: I068eee3ed1207f5b13fd9d01b345413aaf7855b6
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Applying the Lithium SR2 release patch

Change-Id: I8c27796aee06b4332ba73ed859e9bb1a395ec2d0
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Revert "Fix DistributedEntityOwnershipIntegrationTest failure"

This reverts commit de587f935016a300cdbeb85926c2eb677f383fc2.

Change-Id: I6d587db51aac176ad0ff0d5078e1c9b7cce802aa
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Fix DistributedEntityOwnershipIntegrationTest failure

Fixed intermittent failure due the follower2MockListener getting
an ownershipChanged with "false, false, true" if if the original
ownership change with "member-2 is replicated to follower2 after
the listener is registered. The test ran 100 times successfully.

Change-Id: I1f0333e3bc69cc28521bc7388d64b56d18b55544
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix NPE in AbstractFeatureWrapper

Added check in ChildAwareFeatureWrapper#getChildFeatures to verify the
feature exists in the FeaturesService to avoid NPE. This is a similar
workaround as was done in FeatureConfigPusher for a bug in karaf where
the FeaturesService may mysteriously return null for an existing feature.

Change-Id: I006cd012e919ac206d70bb4ee5754c72f0f01b32
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Add getOwnershipState method to EntityOwnershipService

Added a new method to gte the current ownership state for an entity.
This was requested for OF clustering.

The DistributedEntityOwnershipService obtains the EntityOwnershipShard's
DataTree via a new message GetShardDataTree and reads the entity's owner
leaf in order to build the resulting EntityOwnershipState. The DataTree
is obtained once and cached.

Change-Id: Ib4aa2f4e5370d8d5183908b836417936a51458f7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix typo in thrown exception in RestconfImpl.java

Change-Id: I81df54043732a5c332d8f8b8209f66c15993cbf1
Signed-off-by: adetalhouet <adetalhouet@inocybe.com>

Bug 4327 - Fixed DataTreeChangeListener registration in PingPongDataBroker

- delegate broker was incorrectly queried for DOMDataTreeChangeService
- it must ask for supported extensions instead of instanceof
- this is lithium branch fix,
beryllium change: https://git.opendaylight.org/gerrit/#/c/27164/

Change-Id: Ie1757c762018e7188d76a7728f2f8ea52293d73f
Signed-off-by: Michal Polkorab <michal.polkorab@pantheon.sk>

Bug 4105: Fix intermittent failure in DistributedEntityOwnershipIntegrationTest

I saw a test filure on jenkins. After follower2 is stopped there will be
2 onOwnershipChange calls so the test needs to expect both.

Change-Id: I74dc583c2d40e966197315640eb189702fbabd64
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Remove EntityOwnershipCandidate

It was decided that we really don't need to pass an
EntityOwnershipCandidate listener when registering a candidate. Since
apps would most likely create a singleton EntityOwnershipCandidate for
all registerCandidate calls, they might as well register the singleton
listener once via registerListener. This simplifies the interface and
also simplifies OF clustering b/c they need an EntityOwnershipListener
anyway for device node cleanup.

Change-Id: I9fb7d68c1ffbf932c9d0e18efef604c1b05fdf96
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4094: Fix DCNs on initial registration

For DataChangeListener, I modified the code to use a
ResolveDataChangeEventsTask to resolve the initial changed event. It
was noted in Bug 4094 to read the registration path up to the first
wildcard. However this did not work. ResolveDataChangeEventsTask
expects the candidate root path and "after" data to be the tree root
to match the structure of the ListenerTree. When a transaction is
committed, the resulting DataTreeCandidate always points to the root.
So I had to read the root path in ShardDataTree. I could've optimized
for non-wildcarded path registrations but I think wildcarded path
registrations will be the norm anyway.

I added a new method, notifyOfInitialData, in ShardDataTree. Because I
had to create a new ListenerTree with the single registration, I
needed to know the path and scope of the original registration so I
changed several method signatures from the general ListenerRegistration
to the specific DataChangeListenerRegistration which provides access to
the path and scope. However we also have an actor class by that name so
to avoid confusion I renamed the actor class to
DataChangeListenerRegistrationActor.

DataTreeChangeListener was implemented similarly.

Change-Id: I0ab88d0991761c058b6af81d6d26402ff370b78e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add hasOwner param to EntityOwnershipListener#ownershipChanged

OF clustering needs to know when the last candidate is removed for an
entity so it can clean up inventory. We decided to add a new param,
hasOwner, passed to EntityOwnershipListener#ownershipChanged to indicate if
there is at least one remaining candidate and current owner when a
controller node loses ownership. So if
wasOwner=true && isOwner=false && hasOwner=false, the OF code can
remove the device node from inventory.

To simplify the EntityOwnershipListener#ownershipChanged interface and
to allow for possible future parameters w/o breaking the interface, the
parameters are now encapsulated in an EntityOwnershipChanged DTO. There
already was the same EntityOwnershipChanged class in
sal-distributed-datastore - this class was removed.

Change-Id: I07375f154ac55d34062380ad6d0b30d970bd28e7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

CDS: Fix intermittent DistributedDataStoreRemotingIntegrationTest failure

I've seen the testReadyLocalTransactionForwardedToLeader test fail
several times both locally and in jenkins:

DistributedDataStoreRemotingIntegrationTest.testReadyLocalTransactionForwardedToLeader:535
assertion failed: expected class
org.opendaylight.controller.protobuff.messages.cohort3pc.ThreePhaseCommitCohortMessages$CommitTransactionReply, found class akka.actor.Status$Failure

It's a timing issue where the follower may not yet have the leader.
After this patch the test ran 100 times w/o failure.

Change-Id: I542a7e87516e8d1f846cda6e2abc4d473e3de961
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 7fd01f9dc19ef8f02c1b70973fcb091dc0ad8b1e)

Bug 4105: Fixed feature test failure due to missing dependency.

features-mdsal was missing dependency on sal-clustering-config
(type=xml, classifier=entityownershipconfig).

Change-Id: Ifa1380a1071ae3c7b730e79c4e1c8ff09dbe15e2
Signed-off-by: Shigeru Yasuda <s-yasuda@da.jp.nec.com>
(cherry picked from commit 12e9ea641e6cfca47e1c232b788acd6ece5364ba)

BUG 4291 : odl-clustering-test-app feature must depend on odl-mdsal-broker

When odl-clustering-test-app feature depends on odl-mdsal-broker-local the
distributed-entity-ownership service does not get resolved.

Change-Id: I0240ea6b210ec5966f680f786a706365b4f6502c
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 4105: Remove candidates on PeerDown

Currently on PeerDown, the EntityOwnershipShard selects a new owner for
the entities owned by the down node and leaves the down node as a
candidate. If the down node is the only candidate, the owner is cleared.
On PeerUp, it selects a new owner for those entities whose owner is clear.
This was done to handle network partition so a node's candidates remain
registered and are re-assigned when the partition is healed.

Howver this has potential issues when a node is actually
stopped/restarted. It's possible, on restart, that the node doesn't
register a candidate for an entity that it had previously registered for.
So it may get ownership of an entity for which it has no registered
candidate.

To alleviate this, I changed it to remove all the down node's candidates
on PeerDown. If the node was stopped/restarted, then it will
re-register candidates based on local client requests. This case will be
the norm. To handle network partition, when healed, the follower node
will get the replicated commits for its candidate removals from the
leader. So on Candidate removed, it re-adds its removed candidate if it
has a registered EntityOwnershipCandidate.

I realized that one can register a DOMDataTreeChangeListener for a leaf
node. So I simplified EntityOwnerChangeListener to listen for the owner
leaf instead of the entity path. This avoids the extra notifications
when candidayes are added/removed. I actually did this originally b/c I
thought there was a bug when listening at the entity level which turned
out there wasn't but I left it in as an improvement.

I also added the shard's logId to the listener and support classes for
better debugging of unit tests.

Change-Id: I75d2567ce54b9129eee052ba521c8a71777289b6
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Implement EntityOwnershipListener registration/notification

Change-Id: I49ee7f4b5f48ddde4779d37ba34c88dd776dd47b
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Pass ModuleShardConfiguration with CreateShard

The DistributedEntityOwnershipService first adds the
ModuleShardConfiguration to the Configuration prior to sending the
CreateShard message. However if the ModuleShardConfiguration gets
added before the ShardManager actor is created, the entity-ownership
shard is created via ShardManager.createLocalShards. This results in
the entity-ownership instantiated as Shard instead of
EntityOwnershipShard. I've seen this happen in unit tests - it's not
likely to occur in the production system b/c we wait until the
data store is ready prior to creating the
DistributedEntityOwnershipService. But we should prevent it so I
changed the DistributedEntityOwnershipService to pass the
ModuleShardConfiguration with the CreateShard message. The
ShardManager now adds the ModuleShardConfiguration to the
Configuration.

Change-Id: I9f64a27cdd8c24d31e7eb1389210b57ac7a1f604
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add entity ownership integration test

Change-Id: I9578a37f86db44a90aa208d6d89374ba4d3cfb89
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Change ownership on member down/up

Added 2 new messages, PeerUp and PeerDown, that the ShardManager sends
in response to cluster member events.

For PeerDown, the EntityOwnershipShard finds the entities owned by the
down member and selects a new owner based on the remaining candidates.
If there's no other candidates, the owner is cleared (set to "") so new
candidates can become owner. The down members are also tracked via a
downPeerMemberNames set.

For PeerUp, if the up member is in the downPeerMemberNames, the
EntityOwnershipShard finds entities that previously had their owner
cleared and attempts to select a new owner. This handles the case where
a previously down member was the only candidate for an entity so, when
that member comes back up, the entity's owner will be re-assigned to
that member.

Reassigning of owners via PeerDown and PeerUp is only done on the
leader. However that may not handle the case where the leader goes down.
When a new leader is elected we need it to select new owners for
entities owned by the down leader. There are 2 cases here. If the old
leader has not yet been detected as down then eventually we expect to
get PeerDown to handle it. The second case is if PeerDown was already
received prior to the leader change (probably the norm), in which case
PeerDown would not have been processed. To handle this case I overrode
onLeaderChanged to select new owners for entities owned by the old leader
that is passed in. The RaftActor sends the old leader's peerId so I
added a peerIdToMemberNames map to translate - this is populated via
PeerUp. Also I changed the RaftActor to track and pass the actual last valid
leader id, previously it passed the leader id from the previous behavior
which would normally be Candidate which always has a null leaderId.

The newOwner method was changed to ignore candidates in the
downPeerMemberNames set as there's no point in assigning the owner to a
candidate known to be down.

Change-Id: I8f0b78460a1a3e2a6418431f8a8a770a789e8f8d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add Cars RPC to test entity ownership

Change-Id: I8e23698b64ef408ae157ca0d2e94ed1f272128c7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add general-entities yang model

Added a general-entities yang model tha can be used to represent an
entity ID when no existing yang schema exists.

Change-Id: Iec815966fe21ec15cb78ff47c68cda0aa7ae8504
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add dynamic module/shard config for entity-owners shard

Added a new method addModuleShardConfiguration to Configuration.

I simplified the internals of ConfigurationImpl to make it easier to
add a new module/shard config. I combined serveral of the maps into one
moduleConfigMap and reduced the total # of fields to 3. For
synchronization, I kept the maps/sets immutable and used copy-on-write
semantics to update them as they will seldom change. I also made the
fields volatile.

I also removed the singleton nature of ShardStrategyFactory since each
datastore's Configuration will now be different, ie only the operational
datastore's Configuration will have the entity-owners module. The
datastore's ShardStrategyFactory instance is not instantiated and owned
by the ActorContext.

To make things easier for unit tests, I abstracted the file-reading code
in ConfigurationImpl to a new ModuleShardConfigProvider interface and
FileModuleShardConfigProvider implementation in the config package.
I also moved the inner classes to the config package.

While I was at it I also moved Configuration and ConfigurationImpl to the
config package for consistency.

Change-Id: I1d6858d3ae68869ca6f61d4f5a5f0d319d93c485
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Implement UnregisterCandidateLocal in EntityOwnershipShard

Also added a testOwnershipChanges case to EntityOwnershipShardTest to
run thru various ownership change scenarios with local and remote candidates
and local unregistration. As a result I found a couple bugs that I
fixed.

Change-Id: I4343754fbbc8f471975e6c723ffc0beaedee2860
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Integrate EntityOwnerChangeListener with EntityOwnershipShard

Change-Id: Ia302d503f9ff65aa48faf7d69f1405ebf5267166
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Choose Owner for an Entity based on first come first served basis

Change-Id: If40e19cf40e832c9317611bde2950502f7f4897c
Signed-off-by: Moiz Raja <moraja@cisco.com>
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Change commit retry mechanism in EntityOwnershipShard

Change-Id: Iba640eab1c21672ffe6357531c6d236e65c1cd73
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add EntityOwnerDataChangeListener

Added EntityOwnerDataChangeListener that responds to changes to the
entity owner leaf and notifies the EntityOwnershipListenerSupport
appropriately.

I also added an EntityOwnersModel class that defines various entity-owners
yang model constants (moved from EntityOwnershipShard) and has utilities
for creating NormalizedNodes and paths.

Change-Id: Iaa567b5cba6cf0f5cfca0dce39f0f43c38fee4bc
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add EntityOwnershipListenerActor and support

Change-Id: Idbeef3e23ab45a11afe5fce56a55fe5d6945729a
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Move Configuration classes to config package

Change-Id: I863600727f5171eb0db3591a541848aa877a68de
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Implement RegisterCandidate in EntityOwnershipShard

Change-Id: Idab615399d81a8451e22bfabd30aed9a98e4b037
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Implement candidate registration close

Added an UnregisterCandidateLocal message which is sent when
a DistributedEntityOwnershipCandidateRegistration is closed.

Change-Id: I6336e1b83a7764bfb4abc2fc37e196175c008dc3
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug4105: Implement DistributedEntityOwnershipService#registerCandidate

Added a RegisterCandidateLocal message and implemented registerCandidate
to send the message to the local EntityOwnershipShard.

Change-Id: If941401d00912ce34f74e54188af0430a5ec6fcc
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add EntityOwnershipShard

Added the EntityOwnershipShard and modified
DistributedEntityOwnershipService to create it.

Change-Id: Id173b148797e90ff5d38d7f7cde177d303943181
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

entity-owners.yang module

This module describes the data structure used for storing entity
ownership information by the clustered implementation of
EntityOwnershipService

Change-Id: Ib7f8fad74e00b480236b1a2bddb060b093e90ad4
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 4105: Added DistributedEntityOwnershipService and wiring

Added a skeleton DistributedEntityOwnershipService impl class and config
system wiring.

I initially tried to instantiate it in the operational store module but
it needs to be its own service and AFAIK you can only provide one
service per module as createInstance returns a single AutoCloseable.

So I created a new config yang and xml for distributed-entity-ownership-service.
We also need a separate config yang service identity to allow for
multiple impls - I put this in sal-common-api where the
EntityOwnershipService is defined.

Change-Id: I4883af2e749bca5c9dfdac69cf943017294435a3
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add public EntityOwnershipService interface

Change-Id: If65f6ef8116b0a8481a3af1aee88e444771d9e3f
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 4105: Add CreateShard message in ShardManager

Added a new CreateShard message that is processed by the
ShardManager. A new interface, ShardPropsCreator, was added
allowing the caller to instantiate a sub-class of Shard if need be
via the CreateShard message. The DefaultShardPropsCreator creates
Props for the Shard class.

Change-Id: Ieb2c895c85709d963445dc7e15ae9dec9cb3a810
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 4105: Add method to get all unique member names

Added a method to Configuration to get all unique members names
configured for all shards.

Change-Id: I09993541ad7e5963e9eef9cb58b4376daa8f09e8
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 2185: Expand the scope of sync status to cover a slow follower

Previously sync status was used only in the startup scenario
to make the controller appear to the external world as not
synced up unless it had received atleast data till the commitIndex
which the leader reported when it sent the follower it's first
heartbeat.

Now we also track when a new update is sent from the Leader to the
Follower and if the Follower is behind the Leader by a threshold
(hardcoded for now) then we consider the Follower as out-of-sync

Also I added the member name in the ShardManager bean so that is another
place from which we can figure out on which node we are running.

Change-Id: I1ba02575a0a1ac5d601af559f41971f2f5736f9d
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug-4234 - Add count field to cars stress-test RPC.

Added a 'count' field to cars stress-test RPC. The test will stop
after the 'count' cars have been created. If count is zero, then
stress-test will continue till stop-stress-test rpc is given.

Also added some null checks for missing input fields. If rate is
zero, then an error is returned.

Change-Id: Id313f9094e8ca887993e4e8911d0a86b64db7303
Signed-off-by: Shaleen Saxena <ssaxena@brocade.com>

BUG 2185 : Follower should request forceInstallSnapshot in out-of-sync scenario

When the Follower detects that it has more entries in it's log than the Leader
it might be an indication that the Follower was previously a Leader and therefore
it has additional entries in it's log which are missing in the Leader. When the
RaftPolicy is set to allow commits before consensus this could also mean that the
state now has more data than should be present in there. In this scenario Follower
requests the Leader to InstallSnapshot.

Change-Id: I517af148c3933f798ceb87ff88c77c396590881f
Signed-off-by: Moiz Raja <moraja@cisco.com>

BUG 2185 : Disable all internal switching of behavior

Since we do not depend on Raft for changing behavior when elections
are disabled we need to disable all internal switching of behaviors.

Added specific Leader tests to check the following,
1. Do not switch to Follower when you receive an AppendEntriesReply
from a Follower with a higher term
2. Do not switch to IsolatedLeader even when no Follower is sending
AppendEntriesReply

Change-Id: Ic2b4f76813f35db190e108306a62af5397d31658
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug-4214 - Add support for configurable snapshot chunk size.

Added a new variable in the distributed-datastore-provider.yang. This
will be used to configure the snapshot chunk size. Added various
setters/getters to the DatastoreContext. The support for this variable
was added to JMX as well, so that the value can be seen via JConsole.
Moreover, added tests in DatastoreContextTest.

Also fixed a recurring typo in sal-akka-raft. Snapshot was spelled as
snaphot (missing s in shot).

This code was unit tested with different entries in datastore.cfg. Also
tested the case where no special value was provided in datastore.cfg,
and the default value was shown in jconsole.

Change-Id: Ie754075cc25f9eadf01cc65aee726735144c1794
Signed-off-by: Shaleen Saxena <ssaxena@brocade.com>

BUG 2185 : Add JMX API to change the state of a Shard

Added two APIs to the ShardManager MBeans
- switchAllLocalShardsState
- switchShardState

Change-Id: I896e421f322f487b4f8eb321708e01cc93bbd48f
Signed-off-by: Moiz Raja <moraja@cisco.com>

Enabling Data Change Notifications for all nodes in cluster.

Two new interfaces are introduced ClusteredDataChangeListener and ClusteredDOMDataChangeListener and external applications will have to implement any of that interface,
if those applications want to listen to remote data change notifications.

Datastore registers listeners, which are instance of that interface, even on followers.

Change-Id: I0e29cdf2a08a2051de5fc8ce73b9ec8ac408e45b
Signed-off-by: Harman Singh <harmasin@cisco.com>
(cherry picked from commit 66a6b6f931af3fcd1ce61263c457304cfbdc2bb5)

BUG 4212 : Follower should not reschedule election timeout in certain cases.

Before:
Follower rescheduled election whenever it received any message

Now:
Followe reschedules election only if
    - The message received is a RaftRPC message
    - If the RaftRPC message is a RequestVote then only reschedule
      if vote is granted

Change-Id: Ia59c65e4896d72dfc49e86e59b6a9e9331a945ca
Signed-off-by: Moiz Raja <moraja@cisco.com>

BUG 4213 : Candidate should switch to Follower when it receives AppendEntries from new Leader

Multiple peers might become candidates in a single election term. If one peer happened
to become a Leader it will send AppendEntries to all it's peers. When a Candidate receives
an AppendEntries and finds its term to be the same as the AppendEntries term then it should
switch to Follower.

Change-Id: Ia4ce41d4f3eefed50297b90107ad7429bb950ad8
Signed-off-by: Moiz Raja <moraja@cisco.com>

Bug 3708 - APIdoc explorer URLs contain extra 'node' in REST calls

* fixed mount point path builder that added each list qname twice

Change-Id: Ie54919666909dee3fc297b2155c2afea10a4477f
Signed-off-by: Jan Hajnar <jhajnar@cisco.com>

Re-enable tests and bump aaa version

This patch is part 2 of 2 patches:

* Increments the version of the depenency on AAA from 0.2.1-Lithium-SR1
* to 0.2.2-SNAPSHOT.
* Re-enables the feature tests for the restconf and netconf-conector
features.

Change-Id: Ifd57e7d9c2864ac5178946fb7e08d0bc48eb0fae
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Bumping versions by 0.0.1 after the Lithium SR1 release

This patch is part 1 of 2 patches.

* The only version not incrmented is aaa.version, which is left at
0.2.1-Lithium-SR1 since they depend on controller and can't update yet.
* To break the cyclic dependency, this patch temporarily stops running
the netconf-connector and restconf feature tests.

A second patch (to be run after AAA increments their versions to
0.2.2-SNAPSHOT) will update aaa.version and re-enable these tests.

Change-Id: I023b7c4242e225fcc31a891ab671af2aa5374ef8
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Applying the Lithium SR1 release patch

Change-Id: I4153146b3e61077efd84cb0e35616233fb5294ac
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Revert "Bug 3708 - APIdoc explorer URLs contain extra 'node' in REST calls"

This reverts commit 523e75af81fa6537117ceae53c7cdb2b1881aa10.

Change-Id: I01ac3d2176f9fcb08151bd35dcd2eed8c961992b
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Bug 3708 - APIdoc explorer URLs contain extra 'node' in REST calls

* fixed mount point path builder that added each list qname twice

Change-Id: I96d541ea8b40ab5003f82a9e5981e11e1f0fd0d2
Signed-off-by: Jan Hajnar <jhajnar@cisco.com>

Distribution-karaf fails with error factory already defined

Revert "Fix versions to stable/lithium"

This reverts commit d720d5e4c8c9baa7bee2a6bbca467a901fbc0f7d.

Revert "Backport mvn archetypes to stable/lithium."

This reverts commit 1254e0f95ed295bfa7fb1189ee52749d927d0968.

Bug: 4141
Change-Id: I7b4323a9b2e8e669a2b0dd69a4f2acd77362849c
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Bug 3822: Improve error reporting for restconf PUT

A runtime exception can be emitted by the netconf mount point which
should be reported to the user, otherwise you get a 500 response with
no error info which isn't very helpful.

Also the fucntionality to output the error-info field was ommitted with
the conversion from CompositeNode to NormalizedNode so I re-implemeneted
it. It was originally ommitted with a TODO b/c the
NormalizedNodeStreamWriters validate against the schema and error-info
is defined as an empty container in the restconf yang. So there's no way
to create a ContainerNode to represent the error-info data that conforms
to the schema. To work around this, I created a leaf node and special-cased
error-info in the stream writer to elide schema validation.

I also added a regression unit test for the case where the URL contains
an identityref.

Change-Id: I4bb0d767bb8008023e7ef10a439025e2e591f9cd
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Fix versions to stable/lithium

Fixes version issue caused by:

https://git.opendaylight.org/gerrit/24705/

Change-Id: I973dcded34f0490cbd4210022681ec38414d6016
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

CDS: Include CAN_COMMIT phase in rate limiter time period

I was testing with simulated latency in the followers. With a high enough
latency and tx thru-put, the pending commit queue in the
ShardCommitCoordinator got increasingly behind until latencies built up
enough to cause AskTimeoutExeption's on the front-end.

The rate limiter was throttling but not enough. I realized that the rate
limiter times the commit phase but not the canCommit phase. The latter is
what times out with pending tx's sitting in the queue waiting for canCommit.
So I changed ThreePhaseCommitCohortProxy to also time the canCommit
phase. This alleviated the timeouts - even with a really high max latency of
500 ms and 100 tx / sec client thru-put. The rate limiter thru-put
reduced it to about 3 / sec.

Change-Id: I6dc73d1d657519b9410ad034c69d26f19a0cb263
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 2185 : Introduce the SwitchBehavior message

RaftActor processes the SwitchBehavior message to change the behavior
Candidate and IsolatedLeader behaviors are not allowed.

Change-Id: Id8d758c6574a5c58927927b83bc5985081b19c50
Signed-off-by: Moiz Raja <moraja@cisco.com>

CDS: Add stress test RPC to the cars model

For stress testing the CDS, I've been using an RPC that continuously
creates cars at a specified per second rate. I thought it might be
useful to submit it.

Change-Id: I33b9c2e304884b9541774a12ee248082de60f72e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Backport mvn archetypes to stable/lithium.

Based on commits to master from
Ed Warnicke <hagbard@gmail.com>

v2: imported 24728 - whitespace fixes.

Change-Id: I4da6a343fab5d3072e7dec224da01a9a830e214a
Signed-off-by: Jan-Simon Moeller <dl9pf@gmx.de>

Fix akka logging initialization timeout on startup

We've seen CDS internittently failing to startup b/c akka fails to
initialize its logging system - it times out after 5 sec. It seems
on startup threads may be busy enough to prevent akka's message
to inialize the logging actor to timeout. Interestingly, We're only
seeing this on RedHat. The timeout is configured via
logger-startup-timeout which defaults to 5 sec. Increasing it fixes
the issue. I made it really high, 5 min, since if it times out we're
dead in the water anyway.

Change-Id: Ic95d7298b9320f03f664e9f2b171f980546ca95d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG 2185 : Make the Custom Raft Policy externally configurable

A class which implements RaftPolicy needs to be set
to customize the behavior. I am hoping that this will be a barrier
to people unintentionally breaking the default raft policy as
they would need to know the internals of clustering in order to
supply a valid class.

I added two configurable raft policy classes for the two known
use cases,

- org.opendaylight.controller.cluster.datastore.policy.TestOnlyRaftPolicy
- org.opendaylight.controller.cluster.datastore.policy.TwoNodeClusterRaftPolicy.

Change-Id: Ic3cc2f27754c37e85c3be8a863764fc88ec84399
Signed-off-by: Moiz Raja <moraja@cisco.com>

BUG 2185 : Introduce RaftPolicy & DefaultRaftPolicy

These allow for simple customization of the Raft algorithm. The intention is
to support the 2-Node clustering scenario in which leaders for a Shard will need
to be specified externally and where weak(er) consistency may be considered ok

Change-Id: I16bc69a67ac3096082324f11e62565a7b9d7cc57
Signed-off-by: Moiz Raja <moraja@cisco.com>

Fix AppendEntry logic when prevLogIndex and prevLogTerm is -1

When an AppendEntry arrives at a Follower with the prevLogIndex and
prevLogTerm = -1 the Follower will accept that append entry and add it
to the log. For a newly started Follower this can be problematic
because this will be the first entry in that Followers log and so
applying this entry to the Follower's state can end up corrupting the
state or cause failures in committing transactions.

To fix this we now verify if the replicatedToAllIndex is present in the
Followers log. If it is present then the log is considered in sync else
not.

Change-Id: I09bead430f1a4556182263de54846792668cd27c
Signed-off-by: Moiz Raja <moraja@cisco.com>

CLEANUP : Fix javadoc warnings in sal-akka-raft code

Change-Id: Id26cdcac3c4bc7f998483e4078cd1891d0783e8d
Signed-off-by: Moiz Raja <moraja@cisco.com>

Drop executable bit from odl.java.security

This is a plaintext file, it should not be executable

Change-Id: I9e98802e3fb136952c91a56fd059c39e0e6d2f67
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit f8f77d5f5089eaaf139f66a3f1f8aa7a6ca07499)

Bug 3887 - Autogenerated API documentation doesn't show application/xml
as an option for RPC operations

* added xml input option for rpcs, put and post methods

Change-Id: I1f73bcb7d1127e4b4324d779aec40907ca627073
Signed-off-by: Jan Hajnar <jhajnar@cisco.com>

BUG-3878 Prevent null pointer in nc for unresolved addresses

Unresolved address is not a reason for config pusher failure. Use host string
instead of resolved address string representation.

Change-Id: Ieba345077860cd55325ca49980c3fb9edd66051c
Signed-off-by: Maros Marsalek <mmarsale@cisco.com>

Bug 3999: Create internal service to access restconf

There are use cases for invoking restconf from internal code. However
issuing an HTTP request is problematic as one would need to know the
credentials and scheme (http or https).

So I added a JSONRestconfService interface and implementation with CRUD
methods that call the JSON readers/writers and RestconfImpl internally.
The implementation is advertised as an OSGi service via the config
system for consumption by clients.

I only added a service for JSON - an XML service could be added as well
later.

Change-Id: I5d1304c568c9be9c204afea68aadc0306bac50b3
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

CDS: Changes to Tx abort in Shard

I noticed when a tx times out on the front-end during CAN_COMMIT, it
tries to abort the tx but it may not get aborted in the Shard and the
front-end gets an AskTimeoutEx on the abort. The reason is that the
Shard only processes the abort request if the tx is the current tx being
committed. If it isn't, the request is ignored and no response is sent,
resulting in the front-end timeout.

I think it makes sense to also process the abort if the tx is sitting in
the queue awaiting CAN_COMMIT. If the front-end says to abort for any
reason, the Shard should honor it. Also, if it isn't aborted, the Shard
may dequeue it sometime later and attempt to commit it which can lead to
unpredictable results if prior commits failed.

As per the comments in the Helium patch, I did some re-factoring to make
it a bit cleaner. I moved the abort code from the Shard to the
ShardCommitCoordinator. This makes it consistent with the other tx
phases where the Shard mostly delegates to the ShardCommitCoordinator. I
also removed the getCohort method from CohortEntry and added appropriate
methods so the internal cohort instance isn't exposed.

There's more refactoring/cleanup that can be done re: Futures and also
moving CohortEntry into its own class (it's large enough) but I don't
want to overload this patch.

Change-Id: I73c79a5e4a2b39b7ee4d97a011de2d29b050dbc4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

CDS: Retry remote front-end transactions on AskTimeoutException

With the front-end PrimaryShardInfo cache, if the cached primary/leader
shard is remote and unavailable, the RemoteTransactionContextSupport
will fail with an AskTimeoutException when it tries to send the
CreateTransaction message. Since it can take at least 1 election timeout
period to re-elect a new leader, I changed RemoteTransactionContextSupport
to also retry on AskTimeoutException (it already retries on
NoShardLeaderException). However instead of re-sending the
CreateTransaction message, as it did before, it now re-sends the
FindPrimary message to get a new primary shard actor.

I also modified how RemoteTransactionContextSupport retries. It will now
retry for a total period of 2 times the shard election timeout which
should be ample time for a re-election to occur. If no leader is found then
the txn will fail.

I also added a ShardLeaderNotRespondingException which the
RemoteTransactionContextSupport will throw if it ends up with an
AskTimeoutException after the tx creation timeout period. This shouldn't
occur normally as, with the retries, it should get a NoShardLeaderException
even if the initial error was AskTimeoutException. But it's possible to
end up with an AskTimeoutException, eg if the system is overloaded and
the election timeout is delayed.

During testing, I noticed that if you take down the 2 followers and try
a transaction, it fails with an AskTimeoutEx instead of
NoShardLeaderException as one would expect. This is b/c the leader
changes to an isolated leader. So I changed the Shardanager to return
NoShardLeaderException if the state is IsolatedLeader.

Change-Id: I3efd3f841cf41b7738aedb694fa18b44851b3074
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

CDS: Real snapshot log trimming changes

Bug 2692 changed real snapshotting to trim the in-memory log based on
the current replicatedToAllIndex for the normal case when all follower's
are up and "fake" snapshotting is advancing replicatedToAllIndex.

When a follower is down, real snapshots don't trim the log b/c
replicatedToAllIndex isn't advancing unless the memory threshold is
exceeded. So it will let the in-memory log keep growing past the
snapshot batch count. This sort of defeats the purpose of snapshotting,
ie to keep the journal size in check both on disk and in memory. It's
also a bit dangerous - it can chew up a lot of memory and starve the
rest of the system and cause large STW GC's once the follower comes
back up and the log is cleared. This can also cause multiple snapshots
in the follower once it comes back and catches up - eg, if it's behind
60K entries, it will snapshot after each 20K batch in quick succession.

To alleviate the potential excessive memory growth, in addition to
trimming the log from the captured lastAppliedIndex if the log memory
size threshold is exceeded, I changed the code to do the same if the log
size exceeds the snapshot batch count. So if a follower is down long
enough to exceed the snapshot batch count, the leader will install a
single snapshot to catch up the follower. Otherwise, the follower will
be caught up via AppendEntries.

I also noticed that if snapshot tries happen in quick succession, a
second attempt may be tried prematurely while the previous one is in
the PERSISTING state. This is b/c isCapturing() returns false in the
PERSISTING state. The state machine prevents another capture from
actually initiating b/c only the IDLE state implements capture but I
think to be clean we should disallow it by returning true from
isCapturing() in the PERSISTING state. This avoids state violations
during valid workflows (it's not invalid to attempt a capture while
another is in progress). Therefore, since every state now returns true
for isCapturing() except IDLE, I made true the default in
AbstractSnapshotState so only IDLE overrides it.

I added more end-to-end test cases to
ReplicationAndSnapshotsWithLaggingFollowerIntegrationTest to cover
leader snapshots where the log size exceeds the snapshotBatchCount,
where the memory threshold is exceeded and where neither of the first
2 conditions are met and the log is effectively not trimmed. The former
2 cases trim the log to last applied and result in a snapshot installed on
the follower once it's resumed. The latter case results in the leader
catching up the follower via AppendEntries.

Change-Id: Iaec9ba94232a17d6fa7b192c31c431b328e3d22e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

CDS: Add pending tx queue size to ShardStats

To aid debugging, I added the pending commit queue size of the
ShardCommitCoordinator to the ShardStats bean.

Change-Id: I2af3493eb5dd54f9f9406b0a005d66be004c12ff
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Bug 3195: Cleanup on error paths and error handling

Modified ShardCommitCoordinator#handleBatchedModifications to remove the
cohortEntry from the cache if the total messages sent doesn't match the
total received.

With recent yangtools changes, write and merge on a DOM tx can throw
unchecked exceptions if he data is invalid. Modified the front-end
local tx path to catch unchecked exceptions in LocalTransactionContext
and propagate to the LocalThreePhaseCommit to immediately fail the
ready.

Similarly modified ShardCommitCoordinator#handleBatchedModifications to
handle unchecked exceptions from applied operations and propagate when
the tx is readied.

Added unit tests to cover these cases.

Also, modified LocalTransactionContext#readyTransaction to handle
unchecked exceptions from the DOM tx ready.

Change-Id: Ib6fe6e04b8626bf996cfabfe74da780f05ce838a
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

CDS: Change operationTimeout units to millis

For upcoming changes, I'll need to be able to set the operationTimeout
in millis for unit tests.

Change-Id: I7463a9b2e20db2b678e23a94cafd768db3ac099e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

BUG-3861 Detect RPC errors when committing netconf transaction

Transaction from netconf southbound would return a succeeded future in case
an rpc-error was returned as response to commit rpc.

Change-Id: I7ec538f09c5d1b9ac5e18d69ccc6d2c3939d01c3
Signed-off-by: Maros Marsalek <mmarsale@cisco.com>

Make leafrefFromLeafListToLeafTest impervious to list order

In NnToJsonLeafrefType#leafrefFromLeafListToLeafTest, the leaf list is
unordered but the regex expects a certain order based on the way the
list is currently hashed. But this makes it susceptible to failure if
something changes upstream to alter the order. So I changed the regex to
not depend on the order.

Note: an upcoming yang tools patch will alter the order as a side effect
that affects this test.

Change-Id: Ibf95271b008134c3f75fc99c9ee7ba0d643dfe7a
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>

Use AsyncAppender in pax-logging

Using an async appender throughput of the logger, lowering its impact on
overall system performance. Testing with BGP debugging enabled, the
overall time to complete for 100K routes went down from 137 seconds to
81 seconds.

Change-Id: Ieac46ecbddb701f862f7d58833ebdc94de0fbaf4
Signed-off-by: Robert Varga <rovarga@cisco.com>

Fix features-netconf-connector not being included

mdsal-artifacts should be listing the features it needs, so users can
pick up their versions. This was not true for
features-netconf-connector.

Change-Id: I4d53d8cf3d6ff1bce57e9de223126fea5269419d
Signed-off-by: Robert Varga <rovarga@cisco.com>

Bug 3800 - Fix usage of global SimpleDateFormat

* fix usage of thread-safe SimpleDateFormater

Change-Id: I445739c22ecc8da9e5b9c51687fa6077f914de30
Signed-off-by: Vaclav Demcak <vdemcak@cisco.com>

Re-enable tests and bump aaa version

This patch is part 2 of 2 patches:

* Increments the version of the depenency on AAA from 0.1.3-Helium-SR2
* to 0.1.4-SNAPSHOT.
* Re-enables the feature tests for the restconf and netconf-conector
features.

Change-Id: I3a85ac10637d818126ff2543e3e39b22f36ecfea
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Bumping versions by 0.0.1 for next dev cycle

This patch is part 1 of 2 patches.

* The only version not incrmented is aaa.version, which is left at
  0.2.0-Lithium since they depend on controller and can't update yet.
  * To break the cyclic dependency, this patch temporarily stops running
    the netconf-connector and restconf feature tests.

    A second patch (to be run after AAA increments their versions to
    0.2.1-SNAPSHOT) wiil update aaa.version and re-enable these tests.

Change-Id: I7ba0b8c6ced378b7bf6e490884b50ea9e26544b4
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>

Release Lithium

Change-Id: I616543755330301c367cc59005b6d562aa31f953
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>