controller.git
7 years agoBUG-5222: offload XSQLBluePrint creation to first access 92/54192/3 stable/beryllium
Robert Varga [Fri, 13 Jan 2017 12:57:34 +0000 (13:57 +0100)]
BUG-5222: offload XSQLBluePrint creation to first access

Constructing XSQLBluePrint in onGlobalContextUpdated() slows
down startup and is utterly inefficient (like all of XSQL).

As a stop-gap measure move its instantiation to first use,
when it is constructed from saved SchemaContext reference.

Also remove uneeded elements field, as it is not used anywhere
and just gets in the way.

Change-Id: I954d2217da6ec8b12d0b980d864cf3d776df78cc
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit d0dc66335889ecec5dbc962a8604c3df96eca758)

7 years agoBUG-5222: remove xsql from archetype 60/54160/3
Robert Varga [Fri, 31 Mar 2017 09:05:27 +0000 (11:05 +0200)]
BUG-5222: remove xsql from archetype

XSQL should not be here, kill it.

Change-Id: I68bafa8961598f3407763661c1c3a294c6209774
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit cfc5170cf5d75c5c89deeecd726cbf7fa36f660e)

7 years agoBug 6341 Remove copy of dependencies from karaf-parent 49/54449/3
Ed Warnicke [Fri, 5 Aug 2016 22:00:46 +0000 (15:00 -0700)]
Bug 6341 Remove copy of dependencies from karaf-parent

It appears that a lot of bloat is based on the manner in which the transition of
integration/distribution/distribution-karaf was transitioned to using karaf-parent.

There-in lies a tale.  Way back in Helium days... we used a 'dependency' hack for populating system/.
It basically just copied all the dependencies into system/ from the whole shebang.  This had two issues:

a)  Massive overinclusion , as you don't need most of the maven dependencies.
b)  Maven only ever copies *one* version of a thing via the maven dependency mechanisms... so *any* version skew borks you.

To fix this, in Li timeframe I wrote karaf-plugin.  Karaf-plugin walks the features.xml files
and makes sure that we have the needed bundles so all possible features can run.  This radically
reduces the overinclusion, as you are only pulling in the stuff you actually need.
It also solves the version skew problem... which is important because while we don't *want* version skew... it happens.

Because we were at the end of Li, and I did not wish to possibly break someone's
project local distribution, I *only* deployed karaf-plugin to integration/distribution, *not* to karaf-parent.

Since then, others have done good work on karaf-parent (and migrated it to odlparent).
Many thanks to them.  As part of that work, integration/distribution was migrated to use
karaf-parent (again, a good thing).

*But*... this suddenly caused integration/distribution to start doing massive overinclusion again (not good) :(
This appears to account for around 100mb of our size growth since Be.

The correct solution here IMHO would be:

a) To remove the dependency copying behavior of karaf-parent
b) Have the karaf-parent use karaf-plugin

Note: If you do #a without #b... you will likely break the project local distributions.

Sadly, doing #a and #b puts us back in the late breaking change that might impact the
project local distributions place I decided to avoid in Li.  However,
we now have a couple of releases of experience with the karaf-plugin, and thus the risk is somewhat lower.

This patch does #a and #b.

PatchSet #5: Fix karaf-plugin to also install bundles listed in startup.properties as well.

Change-Id: Ie4d99e86cf364fd4fc6d7d99622991133fa3e006
Signed-off-by: Ed Warnicke <eaw@cisco.com>
Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
(cherry picked from commit 99f5917863a8bcf66e723fba94dad8e25618a212)

7 years agoBumping versions by 0.0.1 for next dev cycle 60/47260/1
Anil Belur [Fri, 21 Oct 2016 03:53:40 +0000 (13:53 +1000)]
Bumping versions by 0.0.1 for next dev cycle

Change-Id: I8934e3c24bac73d4b2307edb51f5563d2ce3d43d
Signed-off-by: Anil Belur <abelur@linuxfoundation.org>
7 years agoBug 6348 : car:stop-stress-test RPC to return success & failure counters 95/43095/13
Sai MarapaReddy [Tue, 19 Jul 2016 20:49:21 +0000 (13:49 -0700)]
Bug 6348 : car:stop-stress-test RPC to return success & failure counters

Current RPC car:stop-stress-test doesn't return how many
cars are created or failed. Adding success and failure counters
will help user to determine the number of cars created or failed
during the the process of creation of car tests using
car:stress-test. This patch enhances car:stop-stress-test RPC.

Change-Id: Iff054c8210ce49f06b4fa96ca5a437d9b82deddb
Signed-off-by: Sai MarapaReddy <sai.marapareddy@gmail.com>
Author: Sai MarapaReddy <sai.marapareddy@gmail.com>

7 years agoFix missing LeaderStateChanged event 01/42601/2
Tom Pantelis [Tue, 26 Jul 2016 15:55:09 +0000 (11:55 -0400)]
Fix missing LeaderStateChanged event

In RaftActor, the logic to detect a leader state change compares the last
valid leader Id with the current behavior leader Id. Consider the
following leader Id change sequence:

  "member-1" -> null (goes leaderless)
  null -> "member-1" (member-1 becomes leader again)

The first state change will send a LeaderStateChanged event to the
ShardManager with null leader Id causing the ShardManager to clean its
primary shard info cache. However for the second state change, no
LeaderStateChanged event is sent b/c the new leader Id is the same as
the last valid/non-null leader Id. Therefore transactions fail due to no
shard leader.

I changed it to use the last leader Id (null or not) for the comparison
so every state change is detected.

Change-Id: I7cfab7d8e391fb5a82caf95ff9cb6e1a42b216d0
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoBumping versions by 0.0.1 for next dev cycle 03/43203/1
Thanh Ha [Thu, 4 Aug 2016 22:22:13 +0000 (18:22 -0400)]
Bumping versions by 0.0.1 for next dev cycle

Change-Id: Idc1015f7be5f6f5210bd0e955b47339493817d68
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
7 years agoChange default value of parameter "auto-down-unreachable-after" 95/42095/2
Sai MarapaReddy [Tue, 19 Jul 2016 20:49:21 +0000 (13:49 -0700)]
Change default value of parameter "auto-down-unreachable-after"

Akka documentation suggests not using auto-down feature
in production scenario.
Link - http://doc.akka.io/docs/akka/snapshot/java/cluster-usage.html

Change-Id: I24205a34e13c711791186b1e00d5203f623a0478
Signed-off-by: Sai MarapaReddy <sai.marapareddy@gmail.com>
Author: Sai MarapaReddy <sai.marapareddy@gmail.com>

7 years agoReduce ConflictingVersionException log level to debug 46/41846/2
Sai MarapaReddy [Thu, 14 Jul 2016 16:50:11 +0000 (09:50 -0700)]
Reduce ConflictingVersionException log level to debug

In general it happens when there is  a ConflictingVersionException,
it retries and if it times out while retrying, it will log the error

The ConflictingVersionException is similar to the OptimisticLockFailuerEx
in the data store, i.e. the current config version is incremented and
recorded at the start of a push and if a second config is pushed before
 the first completes, the version changes and it detects that and
 re-pushes the first config. The CSS pushes one config at a time
 but this can happen during dependency resolution if it finds a
 dependent module that wasn't created yet or its config changed
 and needs to be dynamically recreated. The dependent module is
 pushed which results in a conflicting version. This
 happens with BGP.

Signed-off-by: Sai MarapaReddy <sai.marapareddy@gmail.com>
Author: Sai MarapaReddy <sai.marapareddy@gmail.com>
Change-Id: Ic1d4639625fa54ccc3d54331a960f421ad6fa1dd

7 years agoChange count type in the cars model 38/40938/2
Ryan Goulding [Fri, 24 Jun 2016 15:57:00 +0000 (11:57 -0400)]
Change count type in the cars model

The count type is changed from uint16 to uint32.  For some performance/stress
tests, it is desirable to issue 1E7 transactions to provide an adequate sample
size.  Prior to this change, it was impossible to issue a million transactions
without either invoking the RPC several times or using count=0 and stopping
based on log messages.  This makes perf testing easier.

Change-Id: Icf125e45bd85e14df6ed5ad91ddad92a8dd2151b
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>
(cherry picked from commit f36e0782a5bb5409d8dd95e2d08ffdbd65266663)

7 years agoForce install snapshot when follower log is ahead 23/41323/1
Tom Pantelis [Fri, 1 Jul 2016 04:25:17 +0000 (00:25 -0400)]
Force install snapshot when follower log is ahead

It's possible for a follower's log to actually be ahead of the leader's log.
Normally this doesn't happen in raft as a node cannot become leader if its
log is behind another's. However, the non-voting semantics deviate a bit
from raft. Only voting members participate in elections and can become
leader so it's possible for a non-voting follower to be ahead of the leader.
This can happen if persistence is disabled and all voting members are
restarted. In this case, the voting leader will start out with an empty log
however the non-voting followers still retain the previous data in memory.
On the first AppendEntries, the non-voting follower returns a successful
reply b/c the prevLogIndex sent by the leader is -1 and thus the integrity
checks pass. However the follower's returned lastLogIndex may be higher in
which case we want to reset the follower by installing a snapshot.
Therefore I added a check in AbstractLeader#handeAppendEntriesReply if
the reply lastLogIndex > leader's last index.

Since the initial AppendEntries is sent immediately by the leader,
normally the follower will reply and this change works. However if a
follower happens to be disconnected and doesn't reply for some time, the
leader can still progress with new commits. If the leader has enough
commits such that its lastIndex matches or exceeds the lagging
non-voting follower, this check doesn't work. In this case, the
follower's integrity checks will fail since the leader's prevLogTerm
will differ. On reply the leader will start decrementing the follower's
nextIndex in an attempt to find where the logs match. During this
process the leader may trim its log via replicatedToAllIndex in which
case the follower's nextIndex may no longer be in the leader's log and
the leader will install a snapshot.

However if other nodes are down and prevent the log trimming then the
follower's nextIndex may be in the log until it eventually decrements to
0. The follower's integrity checks will pass in this case since the
leader's prevLogIndex will be -1. The follower will then attempt to add
the leader's log entries to its log. It first loops the log entries in
the AppendEntries with the intent of skipping matching entries in its
log (ie index and term the same) and stopping when it finds an entry
that doesn;t exist or finds one whose term doesn't match, in which case
it removes the entries beginning at this index. However I found some
issue in this code. First it was calling get on the getReplicatedLog
which doesn't take into account that the index may be part of the prior
snaphot and not actually in the log. I changed this check to
isLogEntryPresent which takes into account the snapshot. Second, if it
hits a conflicting entry it tries to remove it from the log. However,
as before, it may be in the snapshot and not in the log in which case
nothing gets removed. To alleviate this, I modified removeFromAndPersist
to return a boolean - false meaning it didn't find the index. In this
case I changed it to send back a reply to force a snapshot.

I added several tests in a new class NonVotingFollowerIntegrationTest
that runs thru various scenarios to cover the cases described above.

While testing I ran into some orthoganl issues that I also fixed.

- if a leader has only non-voting followers, on replicate, it should
  immediately commit and apply to state as it does when there's no
  followers since it doesn't need consensus from non-voting followers.
  So I added a method anyVotingPeers to RaftActorContext to handle this
  case.

- When calculating the prevLogIndex and prevLogTerm for the
  AppendEntries message, it calls get on the getReplicatedLog
  which doesn't take into account that the index may be the snaphot
  index/term. Follower does this check prevLogIndex/prevLogTerm so
  the leader should as well.

Change-Id: I3f92fc0b92ddc6d02dc6cb0e56b444a7c61035d7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoClear leaderId when election timeout occurs in non-voting follower 88/41088/10
Sai MarapaReddy [Wed, 29 Jun 2016 23:31:00 +0000 (16:31 -0700)]
Clear leaderId when election timeout occurs in non-voting follower

We need to enable election timeouts on non-voting follower and clear the
leaderId when it occurs to mimic the behavior when it goes to Candidate
on election timeout.

Signed-off-by: Sai MarapaReddy <sai.marapareddy@gmail.com>
Author: Sai MarapaReddy <sai.marapareddy@gmail.com>
Change-Id: I8b3316e14315a47e09b48af2e3ea16a391ec6c5a

7 years agoAdd ServerConfigPayload to InstallSnapshot message 20/41020/4
Tom Pantelis [Wed, 29 Jun 2016 06:09:49 +0000 (02:09 -0400)]
Add ServerConfigPayload to InstallSnapshot message

When the leader installs a snapshot on a follower, it needs to include the
server config info as well. Otherwise if a server config change occurred
while a follower was down, it won't get the updated server config info
and will be out of sync with the rest of the cluster which causes other
issues.

Change-Id: Ic290ed162bf9fdf6b9fe55986ea0c9c9e83b29a9
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoBackport InstallSnapshot message serialization changes 19/41019/3
Tom Pantelis [Wed, 27 Jan 2016 07:33:32 +0000 (02:33 -0500)]
Backport InstallSnapshot message serialization changes

Backported https://git.opendaylight.org/gerrit/#/c/33768/ and
https://git.opendaylight.org/gerrit/#/c/33767/ from master to
eliminate the protobuff serialization to make it easier to change
the serialization. A subsequent patch will add a new field to
InstallSnapshot. Backwards compatibility with versions prior to
Be SR3 is maintained, ie protobuff serialization will still be used.

Change-Id: I465daba0b83e35bfe0e0d5c345a497dd7f9425d4
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoAdd option to enable/disable basic DCL and/or DTCL 67/40867/3
Ryan Goulding [Fri, 24 Jun 2016 15:50:00 +0000 (11:50 -0400)]
Add option to enable/disable basic DCL and/or DTCL

The cars stress test is a very appropriate place to measure the effects
of DCL and DTCL on a very long list.  This change adds a few RPC
implementations in order to do the following:

1) enable DCL
2) disable DCL
3) enable DTCL
4) disable DTCL

This change includes very basic DCL/DTCL implementations, which just log
a message at trace level (off by default but there for ensuring the
onData*Changed(...) method is actually called.

The existing clustering-test-app behavior doesn't change at all;  these
new RPC(s) do not need to be used, and the added Listener implementations
are not registered listeners by default.

Change-Id: I6fcec6cd8c0a082e815561e88b325a55022ad2af
Signed-off-by: Ryan Goulding <ryandgoulding@gmail.com>
7 years agoFix intermittent unit test failures 25/41225/1
Tom Pantelis [Fri, 18 Mar 2016 04:22:27 +0000 (00:22 -0400)]
Fix intermittent unit test failures

Cherry picked from master.

Change-Id: I2ef68b48de8da4cc7d82a91263976295458d011a
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoBug 6106: Prevent flood of quarantine messages 42/40842/3
Tom Pantelis [Sat, 25 Jun 2016 02:04:02 +0000 (22:04 -0400)]
Bug 6106: Prevent flood of quarantine messages

Added a "quarantined" flag to the QuarantinedMonitorActor so it only
prints the warning and attempts to restart the karaf container once
(which is invoked indirectly via the caller's Effect callback).

Change-Id: I0a57af729280abded93d1b1a575df1672e52032e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoFix intermittent test failures in ClusterAdminRpcServiceTest 27/41027/1
Tom Pantelis [Wed, 29 Jun 2016 08:09:17 +0000 (04:09 -0400)]
Fix intermittent test failures in ClusterAdminRpcServiceTest

Failed tests:
  ClusterAdminRpcServiceTest.testChangeMemberVotingStatesForShard:555->verifySuccessfulRpcResult:296
Rpc failed with error: RpcError [message=Failed to change member voting
states for shard cars: Shard
member-3-shard-cars-config_testChangeMemberVotingStatusForShard
currently has no leader. Try again later., severity=ERROR,
errorType=RPC, tag=operation-failed, applicationTag=null, info=null,
cause=null]

Needs to ensure node3's datastore shards are ready with leaders.

Change-Id: Iae6179e6f577b98f267c1afd3a901a14eed81e7f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoFix intermittent test failures in PartitionedLeadersElectionScenarioTest 22/41022/1
Tom Pantelis [Wed, 29 Jun 2016 07:04:47 +0000 (03:04 -0400)]
Fix intermittent test failures in PartitionedLeadersElectionScenarioTest

Seeing intermittent failures on jenkins, eg

Failed tests:
  PartitionedLeadersElectionScenarioTest.runTest1:37->setupInitialMemberBehaviors:313->AbstractLeaderElectionScenarioTest.initializeLeaderBehavior:207
Missing messages of type class
org.opendaylight.controller.cluster.raft.messages.AppendEntriesReply

Sometimes the initial AppendEntries messages go to dead letters,
probably b/c the follower actors haven't been fully created/initialized by akka.
So added retries as a workaround.

Change-Id: I5c838950f8ed2af3d5bc8ee3bd29602d8a8e8a9f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoAdd voting state to shard mbean FollowerInfo 78/40178/2
Tom Pantelis [Fri, 10 Jun 2016 02:25:05 +0000 (22:25 -0400)]
Add voting state to shard mbean FollowerInfo

The shard mbean displays the peer voting states map but it's also useful
to see the voting state in the leader's FollowerInfo.

Also fixed an NPE when JMX accesses the peerAddresses when a peer's
address is null. We use guava's Map.Joiner to output the map but it
throws an NPE for a null entry vlaue. I chnaged RaftActor to put "" in
the map if null.

Change-Id: I1eb963808fd7878dfe1e4935f3ac06a579a3504e
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoImplement cluster admin RPCs to change member voting states 24/39724/4
Tom Pantelis [Wed, 20 Apr 2016 15:41:25 +0000 (11:41 -0400)]
Implement cluster admin RPCs to change member voting states

Backported from master: https://git.opendaylight.org/gerrit/#/c/38086/

Added 3 new RPCs for changing voting states:
  change-member-voting-states-for-shard
  change-member-voting-states-for-all-shards
  flip-member-voting-states-for-all-shards

These replace the original ones added in Be that weren't implemented.
They were added as placeholders based on how it was thought it would
work at that time.

New related ShardManager messages were added that are sent by the
ClusterAdminRpcService.

The flip-member-voting-states-for-all-shards RPC is a shortcut that
obtains the current voting states via the GetOnDemandRaftState message
to the RaftActor and inverts them. New fields were added to the
OnDemandRaftState response to return the voting states.

Modified the ShardStats JXM bean to report the new OnDemandRaftState
fields.

Added a check in RaftActorServerConfigurationSupport to ensure that
there's at least 1 voting member otherwise one can end up with an
unusable shard with no ability to elect a leader.

Fixed a couple bugs in Leader and AbstractLeader that were found during
testing. AbstractLeader needs to take into account the follower's voting
state when determining if the leader is isolated.

Change-Id: I58686e3ce94d58de7cf289e55bb717ba46bc1de5
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoBug 5913: Fix ISE in DefaultShardDataChangeListenerPublisher 24/40324/2
Tom Pantelis [Tue, 14 Jun 2016 14:18:30 +0000 (10:18 -0400)]
Bug 5913: Fix ISE in DefaultShardDataChangeListenerPublisher

The publishChanges method is only called from the
ShardDataTreeNotificationPublisherActor which is single-threaded so
publishChanges can't be called concurrently. However the
DefaultShardDataChangeListenerPublisher instance is passed via
the PublishNotifications message so the Stopwatch isn't thread safe
wrt thread visibility of its internal state. Therefore it's possible
the change in state done on thread 1 isn't immediately visible to
a subsequent thread. To alleviate this, I moved the Stopwatch and the
elapsed time check to the ShardDataTreeNotificationPublisherActor.

Change-Id: I046e7e92aa96eec01d5a355c8431ef797c534ead
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoFix test failures in RaftActorServerConfigurationSupportTest 48/40348/1
Tom Pantelis [Tue, 14 Jun 2016 23:09:18 +0000 (19:09 -0400)]
Fix test failures in RaftActorServerConfigurationSupportTest

Fixed test failures due to order of recent cherry picks that are failing
jenkins builds.

Change-Id: I140d2b9e69c16ef10ccb5e183eb77b0bb56e9ab9
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoImplement change to voting with no leader 60/39660/2
Tom Pantelis [Wed, 13 Jan 2016 21:17:28 +0000 (16:17 -0500)]
Implement change to voting with no leader

Backported from master.

Implemented a special case where on a voting state change from
non-voting to voting, if there's no leader, it will try to elect a
leader in order to apply the change and progress.

This is to handle a use case where one has 2 geographically-separated
3-node clusters, one a primary and the other a backup such that if the
primary cluster is lost, the backup can take over. In this scenario,
there's a logical 6-node cluster where the primary sub-cluster is
configured as voting and the backup sub-cluster as non-voting such
that the primary cluster can  make progress without consensus from
the backup cluster while still replicating to the backup. On fail-over
to the backup, a request would be sent to a member of the backup
cluster to flip the voting states, ie make the backup sub-cluster
voting and the lost primary non-voting. However since the primary
majority cluster is lost, there would be no leader to apply, persist and
replicate the server config change.

Therefore, if the server processing the request is currently non-voting
and is to be changed to voting and there is no current leader, it will
try to elect itself the leader by applying the new server config change in
the RaftActorContext and sending an ElectionTimeout. If it's elected
leader, it persists and replicates the new server config. If no leader
change occurs within the election timeout period, it reverts the server
config change and tries to forward the change request to another server
with the same voting state change. In this manner, the intent is to elect
the newly voting server that has the most up to date log.

Change-Id: I67b5b2d3a97745dbe9a8215f9a28f3a840f2a0db
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoImplement ChangeServersVotingStatus message in RaftActor 59/39659/2
Tom Pantelis [Tue, 12 Jan 2016 23:21:15 +0000 (18:21 -0500)]
Implement ChangeServersVotingStatus message in RaftActor

Backported from master.

Added a new ChangeServersVotingStatus message to change servers to/from
voting members. The leader updates its local peer info and persists and
replicates a new ServerConfigurationPayload with the appropriate voting
states. If the leader changes to non-voting it steps down as leader by
initiating a leadership transfer.

Change-Id: If073e4665cb1a270aae6e3dce36a6b3e900d0282
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoAdd a few toString() methods 32/40232/1
Stephen Kitt [Fri, 11 Mar 2016 10:14:44 +0000 (11:14 +0100)]
Add a few toString() methods

These help when converting to DataTree ;-).

Change-Id: I9b0fdb428ebe0265cb4321bd6ee31dedb4811950
Signed-off-by: Stephen Kitt <skitt@redhat.com>
(cherry picked from commit 86bc3639095c1d6cc3c764ba8e8721257b87c5c6)

7 years agoBug 5504: Fix IllegalStateException handling from commit 44/40044/2
Tom Pantelis [Wed, 8 Jun 2016 06:45:20 +0000 (02:45 -0400)]
Bug 5504: Fix IllegalStateException handling from commit

https://git.opendaylight.org/gerrit/#/c/36172 attempted to
handle/workaround IllegalStateException thrown from commit to re-apply
the transaction. However the change wasn't correct - the commit call
actually throws an ExecutionException which the IllegalStateException as
the cause. So we need to catch ExecutionException and check it the cause
is IllegalStateException.

Change-Id: I65b2d646a60a700d070dea822d20b0e649290643
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoDebug logging in AbstractLeader is too chatty 49/40049/1
Tom Pantelis [Wed, 8 Jun 2016 07:30:33 +0000 (03:30 -0400)]
Debug logging in AbstractLeader is too chatty

The additional debug logging added with
https://git.opendaylight.org/gerrit/#/c/39796/ makes it too chatty with
heartbeats when nothing changed which will roll-over log files much more
quickly. Changed a debug to trace.

Change-Id: I4c204c6d0734d6ac8655380adcc2df09cb2890ae
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoRemove snapshot after startup and fix related bug 53/39853/2
Tom Pantelis [Fri, 3 Jun 2016 20:04:28 +0000 (16:04 -0400)]
Remove snapshot after startup and fix related bug

Fixed an issue in the follower out-of-sync integrity checking where is
needs to take into account that the previous index may be in the
snapshot. A similar issue was seen with other inegrity checks.

These issues were indirectly related to the snapshot after startup that
was introduced in Be. I think this snapshot is unsafe b/c the
replicatedToAllIndex hasn't been determined yet which I think may cause
other issues with the trimming after snapshot completion, as the logic
takes replicatedToAllIndex into account. And there may be other lurking
bugs. I thinks it's safer to let the normal snapshot logic handle it.

The reason for the snapshot after startup was to avoid having to recover
the same journal entries again on restart that were just recovered. However
in reality, in production, servers aren't commonly restarted and
typically go weeks/months in between restarts. By the time of the next
restart there would likely have been another snapshot and an arbitrary amount
of  new journal entries to recover so it really doesn't add much value.

Change-Id: Ie14148e5dbde3e93deafc5943278aea8c9bb3e75
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoGuard against duplicate log indexes 50/39850/2
Tom Pantelis [Fri, 3 Jun 2016 15:59:31 +0000 (11:59 -0400)]
Guard against duplicate log indexes

We saw an issue where a duplicate log index was added to the journal.
The duplicates were contiguous. It is unclear at this point how it
happened but we should guard against it so I added a check to ensure the
new index > the last index.

Change-Id: Iacb7e5c83870eb79550bb4314d7f24c4530fc113
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoAdd more debug output in AbstractLeader and Follower 96/39796/2
Tom Pantelis [Thu, 2 Jun 2016 12:55:55 +0000 (08:55 -0400)]
Add more debug output in AbstractLeader and Follower

Adding more debug to help troubleshoot an issue.

Change-Id: Iff3e78157415de2841bb32f3dd588705d518b015
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
7 years agoUpdate version enforcement to Java 7 64/39064/3
Robert Varga [Wed, 18 May 2016 18:27:36 +0000 (20:27 +0200)]
Update version enforcement to Java 7

OpenDaylight was never able to run with 1.6, hence we should
enforce Java 7 at least.

Change-Id: I3e6a5d21d4af6c3916528178b0465e86190c0dc6
Signed-off-by: Robert Varga <rovarga@cisco.com>
7 years agoBUG-5414 introduce EOS inJeopardy flag 87/38287/3
Robert Varga [Thu, 24 Mar 2016 21:07:49 +0000 (22:07 +0100)]
BUG-5414 introduce EOS inJeopardy flag

The inJeopardy flag is used to indicate that the leader has lost quorum,
e.g. if cannot reach majority of followers or the follower has lost connection
to the leader (and has initiated new elections).

While EOS is in jeopardy, any reported entity state may not reflect cluster-wide
consensus, but rather represents the latest intended state as seen by this node.

Change-Id: I18df5a11ebbef6607fb0a0754ba0f09bc52f19ba
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit d4fa6758d6b94aad894854c0fe6fcd82e7bbefd6)

7 years agoMake Karaf dump heap on OOM by default 12/36812/3
Vratko Polak [Thu, 7 Apr 2016 14:49:42 +0000 (16:49 +0200)]
Make Karaf dump heap on OOM by default

See mails in this thread:
https://lists.opendaylight.org/pipermail/release/2016-March/006098.html
This changes DEFAULT_JAVA_OPTS,
so if user sets JAVA_OPTS it would override this.

Change-Id: I54fad73c5f50a6bf251bd3b255293ff3ef4ed877
Signed-off-by: Vratko Polak <vrpolak@cisco.com>
7 years agoBumping versions by 0.0.1 for next dev cycle 97/38697/1
Thanh Ha [Thu, 12 May 2016 02:47:05 +0000 (22:47 -0400)]
Bumping versions by 0.0.1 for next dev cycle

Change-Id: I93804f91f274da742ad4276e45737948c3ad576e
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
7 years agoRelease Beryllium-SR2 96/38696/1 release/beryllium-sr2
Thanh Ha [Thu, 12 May 2016 02:47:03 +0000 (22:47 -0400)]
Release Beryllium-SR2

Change-Id: Ia633f2d63c086ca8a2eecbda23931c9806e6e117
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
8 years agoBUG 5690 : No owner present even when entity has a candidate 69/37369/1
Moiz Raja [Fri, 8 Apr 2016 18:09:50 +0000 (11:09 -0700)]
BUG 5690 : No owner present even when entity has a candidate

If a candidate for an entity is removed and another added in quick
succession it can leave the owner of the entity blank. This happens
because the BatchedModifications for candidate removal happen one
after another which results in the commit of those modifications.
The BatchedModification which writes an owner on removal is committed
only after the addition of the new candidate. In this scenario when
the new candidate is added it finds that there is still an owner
for that entity and so it does not assign a new owner for that entity.

To fix this problem in onCandidateAdded we check if the currentOwner
is present in the current candidate list and if it is not then we
choose a new owner.

Change-Id: I47f90314e018e25f2c1dac82342b931c4e2d882d
Signed-off-by: Moiz Raja <moraja@cisco.com>
8 years agoFix ApplyState elapsed time check 18/37218/1
Tom Pantelis [Mon, 4 Apr 2016 05:40:06 +0000 (01:40 -0400)]
Fix ApplyState elapsed time check

On ApplyState, there's a check if the elapsed time exceeds a 50ms
threshold and it logs a warning. However the start time is captured when
the message is created prior to queueing. So if there's many ApplyState
or other messages already queued, the elapsed time also includes the time spent
in the queue, ie as a side effect includes the cumulative processing time
of each prior message in the queue. When a follower starts up, there can
be hundreds to thousands of catchup ApplyState messages and, eventually,
the cumulative processing times can add up to more than 50 ms, in which
case every subsequent ApplyState message trips the threshold with
increasing elapsed times, even though none of them actually took 50 ms
to process. Seeing hundreds to thousands of warnings with misleading
elapsed times looks ominous and leads users to think something is wrong.

Therefore I changed it to capture the start time just prior to calling
applyState so it captures just the processing time for that message. I
also removed the startTime field from ApplyState. This class is
Serializable but it is only ever sent locally to self and is never
serialized so there's no backwards compatibility concerns.

Change-Id: I9493734b5307d6dd5d723e5fe416ba97915dfc63
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 57d7e4788a488d992b9868d44ebc392b06e317c5)

8 years agoBUG 5656 : Entity ownership candidates not removed consistently on leadership change 17/37117/2
Moiz Raja [Tue, 5 Apr 2016 01:27:17 +0000 (18:27 -0700)]
BUG 5656 : Entity ownership candidates not removed consistently on leadership change

This patch removes candidates for all downed members when entity ownership shard
leadership changes. This fixes a corner case where a leader/follower are both killed
simulaneously in a cluster which has greater than 3 nodes. In this case the old leader
does not have a chance to remove the killed follower. The new leader does know that
the follower is down so it can remove the candidates for all downed followers on
shard leadership change.

Change-Id: If28f5656e0daee40fb96a937dbd0a868b7d3645a
Signed-off-by: Moiz Raja <moraja@cisco.com>
8 years agoDefault shard-journal-recovery-log-batch-size to 1 58/37158/1
Tom Pantelis [Mon, 4 Apr 2016 14:58:08 +0000 (10:58 -0400)]
Default shard-journal-recovery-log-batch-size to 1

In Helium there was an issue with batching journal log entries in a
single transaction on recovery which could cause validation exceptions
and/or missing data. Setting the batch size to 1 alleviated the issue and
thus it was defaulted to 1.

It was thought this issue wasn't present in Lithium but it is as I have
a Helium journal which exhibits the problem. I have tried this journal
with the current code base and didn't see an issue (it looked like all
data was recovered from what I could tell) but I'm not confident an issue
isn't still lurking with the right combination of modifications across
many journal transactions. It is safest to recover the transactions in the
same manner as they were originally committed, ie one by one.

Therefore I have defaulted the batch size to 1. In my testing, the prior
setting of 1000 doesn't add any value anyway as the recovery time is
virtually the same with batch size 1000 and 1. Setting it to 1
eliminates the potential risk of data loss.

Change-Id: Icd7fd3c60bdd6cf1b677ccae38be810e779d2bd3
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 28313ad901a88b4a5e5e9f54da0368c7171ca080)

8 years agoBug 5613 - unregister candidates on node restart 26/36826/4
Amit Mandke [Tue, 29 Mar 2016 17:25:29 +0000 (10:25 -0700)]
Bug 5613 - unregister candidates on node restart
When EntityOwnershipShard receives CandidateAdded for local candidate
before any local registration happes it means a restart of a local node
must have happned and the candiates are not registered yet.

So this change removes candidate for such case.

The corresponding test reproduces the issue if the change is not applied.

Fixed other test failures.

Change-Id: I0e8e675530c93dca172ca661fa4c5e1250f40150
Signed-off-by: Amit Mandke <ammandke@cisco.com>
8 years agoBug 5625: Fix OutOfMemoryError in YangStoreSnapshot 16/36916/1
Tom Pantelis [Tue, 29 Mar 2016 17:37:43 +0000 (13:37 -0400)]
Bug 5625: Fix OutOfMemoryError in YangStoreSnapshot

Close the InputStream returned via yangTextSchemaSource.openStream().

Change-Id: I3ecd2e1a3f52f91203a3a00c2f982b061cc62c42
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 990c36b8a92ffb36b0b386855f6a7ea79e5ea226)

8 years agoAdd **/yang-gen-config/** to checkstyle ignore path 35/36835/2
Thanh Ha [Mon, 14 Mar 2016 21:08:13 +0000 (17:08 -0400)]
Add **/yang-gen-config/** to checkstyle ignore path

Change-Id: I4080cd5a5c6d2ccd9374af9979ff2fca76e607ab
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
(cherry picked from commit 17d19bca1350102fd5ca1a1b7162cc5fc2ac9f79)

8 years agoAdd yang-jmx-generator dependency 34/36834/1
Thanh Ha [Mon, 14 Mar 2016 20:45:35 +0000 (16:45 -0400)]
Add yang-jmx-generator dependency

When building in parallel sal-dom-xsql fails due to
yang-jmx-generator missing. This implies that yang-jmx-generator is
actually a dependency.

Change-Id: I624d4026d8182c12a147830ded0391eca25b0f62
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
(cherry picked from commit 7cda9d7daa76fa84caead38f5979314ff35cf9db)

8 years agoEnlarge critical section to cover processNextTransaction() 79/36779/1
Robert Varga [Thu, 24 Mar 2016 18:59:49 +0000 (19:59 +0100)]
Enlarge critical section to cover processNextTransaction()

As it turns out the critical section is not sufficient to cover the case
when the user thread performs a submit/allocate/submit in the time window
between us releasing the in-flight transaction and taking the lock: we would
have to re-check inflightTx after taking the lock.

Since we are going to take the lock anyway, reverse the order of operations
by making processNextTransaction() synchronized, which means the user
thread will not be able to submit the transaction even when it observes
inflightTx as null outside the lock.

Change-Id: I688ceb5e8aae28f5e582b64e6bbaa64c9699c7f5
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit 30d98c1da2a32f719302668f8deb6ef4f371749c)

8 years agoBug 5485: Improve DataTreeModification pruning on recovery 17/36317/2
Tom Pantelis [Wed, 9 Mar 2016 04:07:02 +0000 (23:07 -0500)]
Bug 5485: Improve DataTreeModification pruning on recovery

Modified the PruningDataTreeModification and NormalizedNodePruner to
validate path and node QNames via the SchemaContext instead of just
namespaces. This allows migration support for any element to be removed
from a yang hierarchy.

Also handled SchemaValidationFailedException on ready which can happen
with writes which don't immediately validate the sctructure as merge
does. The modification tree is re-applied with pruning.

Change-Id: I986d1116d2e25115f406abc21b1f816525387125
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoBug 5460: Fix snaphots on follower 04/36304/2
Tom Pantelis [Tue, 15 Mar 2016 23:51:58 +0000 (19:51 -0400)]
Bug 5460: Fix snaphots on follower

Added a callback to the appendAndPersist call in Follower to call
captureSnapshotIfReady.

Added checks in ReplicationAndSnapshotsIntegrationTest to verify the
followers snapshot along with the leader.

Change-Id: Ie71f1b16152541d069f9d005ba669cb1e5771dd1
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoStop logging complete data tree on prepare/commit failure 13/36513/2
Moiz Raja [Fri, 18 Mar 2016 19:29:50 +0000 (12:29 -0700)]
Stop logging complete data tree on prepare/commit failure

Sometimes the data tree modification is so large that just trying to create
the buffer to hold the message can make the controller run out of memory. Plus
it's rarely useful to have a log filled with data which obfuscates other
important log messages. This patch still logs the data tree modification at
trace level.

Change-Id: I76bff9f7e836ee5eff347b0b77e2817f441ab953
Signed-off-by: Moiz Raja <moraja@cisco.com>
(cherry picked from commit 2cf157241dc0ce5045c26e2ad07d053a60b37822)

8 years agoBumping versions by 0.0.1 for next dev cycle 96/36596/1
Thanh Ha [Wed, 23 Mar 2016 13:34:10 +0000 (09:34 -0400)]
Bumping versions by 0.0.1 for next dev cycle

Change-Id: I045cfbec3f810bd58885a726ff31612d30dae343
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
8 years agoRelease Beryllium-SR1 95/36595/1 release/beryllium-sr1
Thanh Ha [Wed, 23 Mar 2016 13:34:09 +0000 (09:34 -0400)]
Release Beryllium-SR1

Change-Id: I7acebbdf1c8b0c6172477620c3e468c334768e43
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
8 years agoBug 5504: Handle IllegalStateException from commit 72/36172/1
Tom Pantelis [Sat, 12 Mar 2016 03:36:40 +0000 (22:36 -0500)]
Bug 5504: Handle IllegalStateException from commit

Tries to re-apply the transaction if the "store tree and candidate base
differ" IllegalStateException occurs.

Change-Id: If2ef81d88fbd756edd54842d1afb7cd62043de05
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoFix intermittent RaftActorLeadershipTransferCohortTest failure 59/36159/1
Tom Pantelis [Fri, 11 Mar 2016 23:44:55 +0000 (18:44 -0500)]
Fix intermittent RaftActorLeadershipTransferCohortTest failure

Change-Id: I4c58f6545d7ef7667c7fcf42f5dda82345ab1167
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 68eb628b5aca1fc4de4e29bcacf46dcb7b3a19c8)

8 years agoBug 4823: Offload generation of DCNs from Shard 53/36153/2
Tom Pantelis [Thu, 4 Feb 2016 06:39:20 +0000 (01:39 -0500)]
Bug 4823: Offload generation of DCNs from Shard

Generation of data change notifications can be expensive with large
lists which can block the Shard actor for many seconds. This processing
was offloaded to other actors to free up the Shard, one for DCLs and the
other for DTCLs. I separated the 2 types of listeners b/c DCN generation
is much more expensive than DTCs so at least DTCLs aren't held up by
DCLs.

Change-Id: I1bfb5d572c793f8eb703ebf0a7fd9bf628747168
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit a46305fbc6bb7ec6883c21298d356a5e4fbbb015)

8 years agoFix issues with LeastLoadedCandidateSelectionStrategy 54/36154/2
Moiz Raja [Wed, 27 Jan 2016 22:43:39 +0000 (14:43 -0800)]
Fix issues with LeastLoadedCandidateSelectionStrategy

LLCSS degenerates into a round robin owner allocator when
ownership changes. This patch fixes that issue as follows,

- Consider the statistics that are collected using the DTL
  only as initialStatistics which are passed to the Strategy
  when it is created
- When Leadership changes clear all the strategies so that
  they get freshly created with the right initial statistic
- Modify the newOwner method on Strategy to
    - pass the currentOwner for the entity, for the current
      owner we decrease the ownership statistic
    - remove the statistics passed to it as it would no longer
      be required. Due to this removal we also get rid of all
      the CRUD which we had added to check if the passed in
      stats were actually greater than the local stats which
      anyway did not work.

Change-Id: I754f0459051687a95056857044777ca6eebbcd93
Signed-off-by: Moiz Raja <moraja@cisco.com>
8 years agoFix broken downstream features 46/36146/1
Robert Varga [Wed, 24 Feb 2016 09:18:36 +0000 (10:18 +0100)]
Fix broken downstream features

factoryakkaconf needs to be spelled out in the dependency of
features-mdsal.

Change-Id: I71e7cff1076fc63c08f6debefc72107046f8337f
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit cfdb7ed1fdd440feea75adfe1b0289b76ffc9e50)

8 years agoBug 5329: Add factory akka.conf 44/36144/1
Tom Pantelis [Fri, 12 Feb 2016 14:08:15 +0000 (09:08 -0500)]
Bug 5329: Add factory akka.conf

Added a factory akka.conf file that is shipped to
configuration/factory/akka.conf. This file contains all the necessary
akka settings. Modified the FileAkkaConfigurationReader to load the
existing configuration/initial/akka.conf file with the factory file as
the fallback. In this manner akka will overlay/merge the initial file
with the factory file. I pared down the initial file to only contain the
settings that users would normally set or configure to setup a cluster,
ie hostname, port, seed-nodes, roles.

In the features.xml, the factory file is configured to always overwrite
so changes are picked up on upgrade. We still preserve the initial file.

Change-Id: I8e80161e21d0ad0e26f1efa1023c670b3a5ef6bc
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 6e76bb44514ea79524f46ff283ea0d0d4ad8c7f8)

8 years agoChoose owner when all candidate registrations received. 75/36075/1
Moiz Raja [Fri, 15 Jan 2016 20:38:11 +0000 (12:38 -0800)]
Choose owner when all candidate registrations received.

In the Delayed Owner Selection Strategy we should not wait for
the timeout to occur when we have received the candidate
registrations for all the candidates possible in the system.

Change-Id: Ifcd1f376b050baf2e422e00bd4a93a4d9d3d6c45
Signed-off-by: Moiz Raja <moraja@cisco.com>
8 years agoAdd notification-dispatcher configuration for default akka.conf. 71/36071/2
Moiz Raja [Sat, 16 Jan 2016 02:13:56 +0000 (18:13 -0800)]
Add notification-dispatcher configuration for default akka.conf.

Change-Id: I9d4983b9d435f527738a84aa03904f23ec2237c1
Signed-off-by: Moiz Raja <moraja@cisco.com>
(cherry picked from commit 2b7d2365c64087cfce66196bf0bf5857c0a4c315)

8 years agoWhen no candidates are present for an entity do not return EntityOwnershipState 70/36070/2
Moiz Raja [Fri, 8 Jan 2016 04:18:01 +0000 (20:18 -0800)]
When no candidates are present for an entity do not return EntityOwnershipState

Change-Id: I22c0100755a1fca50c638ff4b435e04bdd0f76ff
Signed-off-by: Moiz Raja <moraja@cisco.com>
(cherry picked from commit 4f2123238f32ad97019ad0ce0a7b588ea33397ed)

8 years agoFix reading of EntityOwnerSelectionStrategy 69/36069/1
Moiz Raja [Thu, 7 Jan 2016 22:07:02 +0000 (14:07 -0800)]
Fix reading of EntityOwnerSelectionStrategy

1. The pid used for reading a config admin file should not have hyphens
   so replaced them with dots
2. The config admin returns properties that are not from the file
   so we need a way to ignore them. I specifically look for the
   a prefix of "entity.type." and ignore the other properties

Change-Id: I26a66176583ec39cbdb78fec749022429218e005
Signed-off-by: Moiz Raja <moraja@cisco.com>
(cherry picked from commit 7030ae1a3c8fcc19e2b88d874a18faf73496682e)

8 years agoBug 4823: Use tx commit timeout for BatchedModifications 78/34778/3
Tom Pantelis [Mon, 8 Feb 2016 21:50:23 +0000 (16:50 -0500)]
Bug 4823: Use tx commit timeout for BatchedModifications

When sending BatchedModifications messages to the shard we use the
general operation timeout which is 5 sec. We should instead use the
transaction commit timeout to be consistent with the other transaction
messages (ReadyLocalTransaction, CanCommitTransaction etc).

Change-Id: If69704c3e9bde7f2cbed344912166137d43c039b
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoFix ConcurrentModificationEx in RpcRegistry.onBucketsUpdated 75/35775/2
Tom Pantelis [Thu, 3 Mar 2016 17:52:47 +0000 (12:52 -0500)]
Fix ConcurrentModificationEx in RpcRegistry.onBucketsUpdated

This was introduced by a recent patch. onBucketsUpdated iterates the
routesUpdateCallbacks however one of the callbacks in receiveGetRouter
removes itself from the list causing the ConcurrentModificationEx.

I changed onBucketsUpdated to first copy the list to an array to prevent
this.

Change-Id: I44c9a89b4b433f711cf4f90bf28e6955d8784f5f
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit f09d37ec4cce2411eaae11dde18f9ce2d2f14118)

8 years agoFix missing bundle 79/35779/1
Robert Varga [Sun, 21 Feb 2016 15:33:04 +0000 (16:33 +0100)]
Fix missing bundle

Change-Id: I9b8c0ca660e0101a2459f92dd16e36727f8ab9c3
Signed-off-by: Robert Varga <robert.varga@pantheon.sk>
(cherry picked from commit aeb60bc8ab1b62e18dc090f946c2bdf12b3e9a6c)

8 years agoBug 4866: Add wait/retries for routed RPCs 06/35306/1
Tom Pantelis [Fri, 5 Feb 2016 07:42:54 +0000 (02:42 -0500)]
Bug 4866: Add wait/retries for routed RPCs

If a routed RPC is registered on one node it takes a little time for the
route to propagate via gossip to other nodes. If another node tries to
invoke the RPC prior to propagation it fails. To alleviate this timing
issue, I added wait/retries via a timer in the RpcRegistry for the
FindRouters message. As routes are updated via gossip, it retries the
FindRouters request. If the timer triggers, it sends back an empty list.
The timer period is 10 times the gossip tick interval (500ms * 10 = 5s).

Change-Id: Iaafcfb4c93cde44f62f6645c8b8684102ac0d0db
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 92ce52ab3df561a2a07bf56c7115123b0825449e)

8 years agoBUG-2912: Better document DataChangeScope.ONE 72/34772/2
Colin Dixon [Tue, 16 Feb 2016 18:06:50 +0000 (13:06 -0500)]
BUG-2912: Better document DataChangeScope.ONE

Provides information about the way this scope interacts with lists in and the
binding independent data tree that might be counterintuitive when compared with
a binding aware view of the same data.

Change-Id: If966331b4daa5a88be61fb2efea65a4b69495b0b
Signed-off-by: Colin Dixon <colin@colindixon.com>
8 years agoFix sporadic ShardManagerTest failures 07/35107/2
Tom Pantelis [Wed, 17 Feb 2016 18:55:20 +0000 (13:55 -0500)]
Fix sporadic ShardManagerTest failures

Some of the tests fail sporadically. Most were alleviated by:

  - using tell on an actor rather than calling receiveCommand directly
  - using the normal fork/join dispatcher for creating TestActors instead
    of the default CallingThread dispatcher.

After the changes the tests ran over 200 times successfully.

Change-Id: Ib2c7c3b6dace9e89dff54eccc58a2b8aabad75de
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 2f6d96da89035d5aec78c90bce5065d2f202a515)

8 years agoBug 4627: Fix premature RO tx cleanup 63/34763/3
Tom Pantelis [Tue, 16 Feb 2016 04:45:36 +0000 (23:45 -0500)]
Bug 4627: Fix premature RO tx cleanup

For the RO tx PhantomReference cleanup mechanism, modified
RemoteTransactionContextSupport to pass the front-end client
TransactionProxy instance as the referent to the
FinalizablePhantomReference. Previously we were passing the
RemoteTransactionContextSupport instance which is only reachable via
a hard reference until the primary shard actor is obtained and thus may
be eligible for GC while the TransactionProxy is still in use.

Change-Id: Ib2808b4ba8113a5722f9ee422434a89adaf775fe
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 89274c9c31212a1d0f13aeb90384442e72221029)

8 years agoReduce output from DeadlockMonitor 18/34718/3
Tom Pantelis [Fri, 12 Feb 2016 14:56:46 +0000 (09:56 -0500)]
Reduce output from DeadlockMonitor

If a module doesn't finish after 5 sec, the DeadlockMonitor starts
logging warning messages. However it does this every second. CDS will
wait up to 90 sec for all shards to elect a leader so the
DeadlockMonitor produces a lot of output during this period. To reduce
the noise I changed the sleep to use WARN_AFTER_MILLIS so the message is
logged every 5 sec.

Change-Id: I63842075dee1fc6a4fc4e4200cc089e33a110e78
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 384479f0181763f3202f2f7ad681e90182bcc820)

8 years agoBinding Codecs support of APPEARED,DISAPPEARED. 59/35059/2
Tony Tkacik [Fri, 19 Feb 2016 10:55:13 +0000 (11:55 +0100)]
Binding Codecs support of APPEARED,DISAPPEARED.

In BE two new modification types were introduced for
structural containers, but binding codecs were not
updated accordingly.

Frontend mapping is simple:
  APPEARED -> SUBTREE_MODIFIED
  DISAPPEARED -> DELETE

Change-Id: I62810c501234a62343150c328c6f2802402669c5
Signed-off-by: Tony Tkacik <ttkacik@cisco.com>
8 years agoBUG-5247: notify listeners for entities which are not owned 74/34674/3
Robert Varga [Fri, 5 Feb 2016 17:45:40 +0000 (18:45 +0100)]
BUG-5247: notify listeners for entities which are not owned

Rather than broadcasting just the 'up' state, notify listeners about all
state we know of.

Change-Id: Iaae6db925a321aad420fa0ee8bdf8b56b5d2a29e
Signed-off-by: Robert Varga <rovarga@cisco.com>
(cherry picked from commit e86a9107fc3ae4451b5a7eb54a03f9ad6776fe72)

8 years agoFix intermittent ShardTest failures 54/34654/3
Tom Pantelis [Sun, 14 Feb 2016 12:17:36 +0000 (07:17 -0500)]
Fix intermittent ShardTest failures

Some tests fail intermittenly due to modifying Shard state directly
instead of thru messages.

Change-Id: I704d6d23c1b2a47e78b3d8823a3136e921e9113b
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 892a6ca966046fd790bdf8a64dccb456a3ece8b4)

8 years agoBumping versions by 0.0.1 for next dev cycle 41/34941/1
Thanh Ha [Thu, 18 Feb 2016 21:32:02 +0000 (16:32 -0500)]
Bumping versions by 0.0.1 for next dev cycle

Change-Id: Ib8013410eca860b8cbd3cdd246c4506610b53a6b
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
8 years agoRelease Beryllium 40/34940/1 release/beryllium
Thanh Ha [Thu, 18 Feb 2016 21:31:59 +0000 (16:31 -0500)]
Release Beryllium

Change-Id: I676190af22ffe729663af4023b95548b9fd1feac
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
8 years agoFix intermittent LeaderTest/CandidateTest failures 65/34565/1
Tom Pantelis [Thu, 11 Feb 2016 08:24:53 +0000 (03:24 -0500)]
Fix intermittent LeaderTest/CandidateTest failures

The test cases in LeaderTest and CandidateTest have been failing
intermittently. A particular test in CandidateTest has recently started
failing fairly regularly on jenkins for some reason.

The common denominator is that an initial message to an actor isn't
received and goes to dead letters instead, even though the actor was
just created. This seems related to the use of ActorSelection in the raft
behavior classes, I suspect a timing issue where the underlying actor
isn't actually created/available yet via actorSelection. I had seen this
in the past and attempted to alleviate it by adding a verifyActorReady to
TestActorFactory to verify with retries that the actor can be obtained via
actorSelection.resolveOne. However it doesn't appear resolveOne works as
advertised or maybe a successful call doesn't mean a message will
succeed.

I changed verifyActorReady to send an Identify message to the
actorSelection and verify successful response. On my system LeaderTest
would usually fail within 30 test runs. After the change it ran
successfully 400 times.

Change-Id: I2da7d4a4d14c68810e87fc64b711b5c80608f5d7
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 5e8721fd675825ec5c9f826aed61c97e22188960)

8 years agoBug 5153: Add timestamp to TransactionIdentifier 14/34514/1
Tom Pantelis [Wed, 3 Feb 2016 06:43:38 +0000 (01:43 -0500)]
Bug 5153: Add timestamp to TransactionIdentifier

TransactionIdentifiers are created locally but sent to the remote leader
so it's possible, after a restart, for the remote leader to see the same id
for 2 different txns since the local counter starts at 1. To alleviate
this I added a timestamp to TransactionIdentifier. I could've just used
a UUID but the counter is useful for debugging and a full UUID would
make the string version pretty long for logging. I think an additional
millisec timestamp is sufficient.

Change-Id: Iaabd3d25eb64dd14053f96336c48de90d4364678
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 74fc38503a3565bed6218f65ab8f4425c61460a3)

8 years agoPublic constants need to be final 65/34165/1
Robert Varga [Fri, 5 Feb 2016 14:45:29 +0000 (15:45 +0100)]
Public constants need to be final

Make constants really constant.

Change-Id: Iacca77288d0b53f578da367fd490ff69b4484a2a
Signed-off-by: Robert Varga <rovarga@cisco.com>
8 years agoBUG 5115: Fix missing artifact exception in log 16/33716/2
oshvartz [Wed, 27 Jan 2016 10:53:00 +0000 (12:53 +0200)]
BUG 5115: Fix missing artifact exception in log

Add new dependency "org.apache.karaf.region.persist"
in karaf-parent pom to fix missing artifact exception
for region feature.

Change-Id: I1d08b69e4afee4e4911d9fc5be9cfd5250868b3f
Signed-off-by: oshvartz <oshvartz@redhat.com>
(cherry picked from commit 2d0262de6e6371cd2d4875c598cd30fe891a76dc)

8 years agoBUG-4869: use odl-lmax feature 30/33830/2
Robert Varga [Sun, 31 Jan 2016 00:43:58 +0000 (01:43 +0100)]
BUG-4869: use odl-lmax feature

Removes direct declaration of lmax version, pulling in odl-feature from
odlparent instead.

Change-Id: I52ca9433e25efc42159ee8929837f1b0d6f7292b
Signed-off-by: Robert Varga <rovarga@cisco.com>
8 years agoBug 5109: Handle stand alone leaf nodes in CDS streaming 44/33644/1
Tom Pantelis [Thu, 21 Jan 2016 15:59:50 +0000 (10:59 -0500)]
Bug 5109: Handle stand alone leaf nodes in CDS streaming

Modified AbstractNormalizedNodeDataOutput to output the leaf set QName
that is now passed to leafSetEntryNode if no parent LeafSetNode QName is
present. Modified NormalizedNodeInputStreamReader accordingly.

I also found that OrderedLeafSetNode was not handled correctly.
AbstractNormalizedNodeDataOutput#startOrderedLeafSet needs to set
lastLeafSetQName.

The NormalizedNodePruner assumed a leaf set entry node must have a
parent and threw an exception if not, similarly with leaf node and anyXML
node. But all 3 can be standalone so I modified NormalizedNodePruner to
handle it.

Change-Id: I02a71d9280dac0eb466ff401699a40d3d8826220
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 7a0fb19fe86fbf7c7bd78f7e522884b6e477b067)

8 years agoBug 4992: Removed old leader's candidates on leader change 03/33603/1
Tom Pantelis [Wed, 27 Jan 2016 00:33:04 +0000 (19:33 -0500)]
Bug 4992: Removed old leader's candidates on leader change

Modified onLeaderChanged to call removeCandidateFromEntities same as
onPeerDown.

Change-Id: I9b56e64254485fa0de4fdc1b7f4f6ddf100338af
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 207129172cb981630f955170cb67efceba02df85)

8 years agoReduce logging in QuarantinedMonitorActor 93/33193/2
Tom Pantelis [Wed, 20 Jan 2016 19:01:41 +0000 (14:01 -0500)]
Reduce logging in QuarantinedMonitorActor

The QuarantinedMonitorActor logs every AssociationErrorEvent as warn
which causes a lot of output when a peer node is down as akka raises a
conneciton-refused event every 5 sec until it re-connects. Since we're
only interested in the specific quarantined event, which is logged at
warn, other events should log to debug to avoid the noise.

Change-Id: I26ab7db9a71d137ae3227409d6dcbf39675c6ec9
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoClear out router event on completion 60/32960/2
Robert Varga [Sat, 16 Jan 2016 14:59:41 +0000 (15:59 +0100)]
Clear out router event on completion

Rather than keeping references on heap, clear references once the future
has been notified. Also some logging to enable debugging.

Change-Id: I2ab352db51134b30fb352a4adabc07eda0945841
Signed-off-by: Robert Varga <robert.varga@pantheon.sk>
8 years agoRemove the leader's FollowerLogInformation on RemoveServer 27/33027/2
Tom Pantelis [Mon, 18 Jan 2016 09:03:58 +0000 (04:03 -0500)]
Remove the leader's FollowerLogInformation on RemoveServer

On RemoveServer, if removing follower, we need to also remove the
FollowerLogInformation entry from the followerToLog map in
AbstractLeader. Also, if a snapshot was being installed, we should
cleanup the mapFollowerToSnapshot.

Change-Id: I37df57a82a1c79ce375e48127bafd661a2dfe2c6
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoBug in AbstractLeader replication consensus 99/33099/1
Gary Wu [Fri, 15 Jan 2016 19:31:48 +0000 (11:31 -0800)]
Bug in AbstractLeader replication consensus

In determining whether to advance the commit index, only the voting
members should be counted in the replicatedCount.  There was a logic
error that instead caused it to be incorrectly based on whether the
AppendEntriesReply message as sent by a voting member.

This patch fixes the issue.

Change-Id: I6efb9574c39db608351297fc2552689d1ff77979
Signed-off-by: Gary Wu <gary.wu1@huawei.com>
8 years agoBug in AbstractLeader replication consensus 58/32858/2
Tom Pantelis [Wed, 13 Jan 2016 21:14:27 +0000 (16:14 -0500)]
Bug in AbstractLeader replication consensus

I ran into an issue where the leader's commit index wasn't advancing
for new log entries even though consensus was reached. This scenario can
occur if the leader previously didn't get consensus and thus didn't commit
and apply a log entry and later regains leadership with a higher term.

The code in handleAppendEntriesReply doesn't update the commit index
if an entry's term doesn't match the current term. This behavior is correct
as per the raft paper - ยง5.4.1: "Raft never commits log entries from
previous terms by counting replicas". However the code also breaks out
of the loop and thus can never make progress on new entries in the current
term that reach consensus. This part is incorrect - as per raft "once an
entry from the current term is committed by counting replicas, then all
prior entries are committed indirectly". Therefore we need to continue
processing subsequent log entries in order to eventually make progress.

Change-Id: I2d093848c3a846e1f6420ac695b4ff652a65bf6b
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoBUG-4963: Bump scala to 2.11.7 57/32757/3
Tomas Cere [Tue, 24 Nov 2015 08:25:25 +0000 (09:25 +0100)]
BUG-4963: Bump scala to 2.11.7

With 2.10 I was experiencing random freezes when installing features
that used it in karaf.

Bumping to 2.11 doesn't seem to break anything nor has any downsides.

We have scala.micro.version defined so use it instead of the
hardcoded micro version

Change-Id: I2a445790980d0da3152db3664294fd789f8272c7
Signed-off-by: Tomas Cere <tcere@cisco.com>
Signed-off-by: Robert Varga <rovarga@cisco.com>
8 years agoBug 4455 - Inconsistent COMMIT operation handling when no transactions are present 40/32940/2
Jakub Morvay [Thu, 14 Jan 2016 16:02:42 +0000 (17:02 +0100)]
Bug 4455 - Inconsistent COMMIT operation handling when no transactions are present

Return positive response for commit operation in config subsystem
netconf northbound even if no candidate transaction is open for session.

Need to be merged after https://git.opendaylight.org/gerrit/#/c/32598/

Change-Id: Ia6ce2aa6ffdfafc47f69ae7315669f64b653c514
Signed-off-by: Jakub Morvay <jmorvay@cisco.com>
8 years agoBUG 4017: Notification publish service is not available from provider context 60/32760/2
Tomas Cere [Mon, 11 Jan 2016 16:08:11 +0000 (17:08 +0100)]
BUG 4017: Notification publish service is not available from provider context

Change-Id: I2cb2dd4e6e3c22b8db1d368bde2c914d53100661
Signed-off-by: Tomas Cere <tcere@cisco.com>
8 years agoDisallow remove leader in single node 23/32823/2
Tom Pantelis [Tue, 12 Jan 2016 06:30:29 +0000 (01:30 -0500)]
Disallow remove leader in single node

We don't want to allow removal of the leader in a single node cluster,
ie when there's no followers.

Change-Id: I3bedd1727736c7dfec55ba696f5ef1197a68c89d
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 421b5a27bd36cdaa04159d5f7ceb9f8e3affb2fa)

8 years agoSet to non-voting if not in server confguration 22/32822/1
Tom Pantelis [Tue, 12 Jan 2016 06:07:35 +0000 (01:07 -0500)]
Set to non-voting if not in server confguration

On recovery, if a RaftActor is not in its own recovered
ServerConfigurationPayload list, then set itself to a non-voting member
so it stays at Follower and doesn't try to start an election.

This scenario is an edge case for Shards as, normally, when a server is
removed, it self-destructs and is removed from the ShardManager. However
there is a small window where disconnect or shutdown could prevent
ShardManager removal from occurring. This patch protects against a server
restart causing disruption after removal.

Change-Id: I64ecd89cddec7a4e1711e0d8d17c7ea6b36e29a0
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
(cherry picked from commit 8dabbaa07e7034a2f385f9b553eaf2dbde91525b)

8 years agoRevert "Add mockito-configuration to tests" 19/32819/1
Robert Varga [Fri, 15 Jan 2016 12:20:32 +0000 (12:20 +0000)]
Revert "Add mockito-configuration to tests"

This reverts commit dcc92fc8fdf056d5ada94931f2d24523070fd9a7.

Change-Id: Ia89b88f9b933d31d369e5ad75ebf8c762c9dfde0
Signed-off-by: Robert Varga <robert.varga@pantheon.sk>
8 years agoUpdate .gitreview for stable/beryllium 28/32628/1
Thanh Ha [Thu, 14 Jan 2016 21:24:32 +0000 (16:24 -0500)]
Update .gitreview for stable/beryllium

Change-Id: Ie99a1d430deaba902a182cb986d721ee5ec0e557
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>
8 years agoRemove ModificationPayload class 01/32401/5
Tom Pantelis [Tue, 12 Jan 2016 08:31:03 +0000 (03:31 -0500)]
Remove ModificationPayload class

The ModificationPayload class was introduced early in Lithium but was
replaced later in Lithium by DataTreeCandidatePayload. Since ModificationPayload
was never contained in a release it can be removed.

Change-Id: Ia4da96695fb9c0356d16f048451b4dab7e0bcf70
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoRemoved unused actorPath from ShardManager. 08/31408/5
Tony Tkacik [Wed, 16 Dec 2015 09:25:40 +0000 (10:25 +0100)]
Removed unused actorPath from ShardManager.

Change-Id: I31f52e59ff59d5acc86feed32118a392cf1132bc
Signed-off-by: Tony Tkacik <ttkacik@cisco.com>
8 years agoBUG 4930 & BUG 4017: Allow multiple refine statements in MXBean generation 30/32430/3
Tomas Cere [Tue, 12 Jan 2016 14:30:46 +0000 (15:30 +0100)]
BUG 4930 & BUG 4017: Allow multiple refine statements in MXBean generation

Stops enforcing a single refine statement when generating MXBean's.

Change-Id: I2f07fc23b355b1871170a00baf52db34f5e6eb66
Signed-off-by: Tomas Cere <tcere@cisco.com>
8 years agoRemove deprecated getDataStoreType methods 96/32396/3
Tom Pantelis [Tue, 12 Jan 2016 07:27:11 +0000 (02:27 -0500)]
Remove deprecated getDataStoreType methods

getDataStoreName methods were recently added to DatastoreContext and ActorContext
to replace the getDataStoreType methods. The latter were marked as
deprecated but we can remove them since they aren't public APIs outside
of the context of sal-distributed-datastore. The remaining callers were
migrated to the getDataStoreName methods.

Change-Id: I7dab731d96b3b8c249a59824de4d78ea72500e05
Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
8 years agoBug 3871: Deprecate opendaylight-inventory model. 77/25477/9
Tony Tkacik [Wed, 19 Aug 2015 12:53:16 +0000 (14:53 +0200)]
Bug 3871: Deprecate opendaylight-inventory model.

Change-Id: I526496120b79158df41aec315d67303e4604074b
Signed-off-by: Tony Tkacik <ttkacik@cisco.com>
Signed-off-by: Robert Varga <rovarga@cisco.com>
8 years agoUpdate features archetypes 14/31814/3
Stephen Kitt [Wed, 23 Dec 2015 11:01:00 +0000 (12:01 +0100)]
Update features archetypes

* Use {{VERSION}} in the documentation comments.
* Use {{VERSION}} in the generated features.xml.
* Remove an invalid space in the schema locations.
* Hook up opendaylight-karaf-features to features-parent.
* Import yangtools-artifacts in the generated features pom.xml.
* Remove redundant versions in the generated features pom.xml.

Change-Id: I60e7d49d0d29a1d9040501e7a8fa0a61ef6fc1bc
Signed-off-by: Stephen Kitt <skitt@redhat.com>
8 years agoAdd mockito-configuration to tests 45/32045/11
Robert Varga [Sat, 2 Jan 2016 23:38:29 +0000 (00:38 +0100)]
Add mockito-configuration to tests

Ynagtools' mockito-configuration ensures that all methods touched in
mocked objects have to be mocked, preventing failures which are hard to
track down.

The reason for this is that by default unmocked methods do nothong and
return null -- injecting nulls into context which do not expect them.

Change-Id: If7b9afac01128be6f1b2a90b1e8c068cb4a39b65
Signed-off-by: Robert Varga <robert.varga@pantheon.sk>
8 years agoInternalJMXRegistration should be an ObjectRegistration 58/31858/9
Robert Varga [Wed, 23 Dec 2015 23:55:54 +0000 (00:55 +0100)]
InternalJMXRegistration should be an ObjectRegistration

This way it follows AutoCloseable#close() contract, e.g. allows multiple
invocations.

Change-Id: Ied93bbdd388189a928cf06cbbc913fe124a284dd
Signed-off-by: Robert Varga <rovarga@cisco.com>