Bug 2038: Ensure only one concurrent 3-phase commit in Shard Added a ShardCommitCoordinator class that ensures there's only one concurrent 3-phase commit. The following outlines the new commit workflow: - On ready, the ShardTransaction creates the dom store cohort and forwards a new ForwardedReadyTransaction message to the shard. - The shard calls its ShardCommitCoordinator to add the cohort and modificaton to a cached keyed by transaction ID. - On CanCommitTransaction message, the ShardCommitCoordinator looks up and removes the cohort entry from the cache corresponding to the transaction ID passed via the CanCommit message. The ShardCommitCoordinator also caches the cohort entry for the current transaction in progress. If there's no transaction in progress, the committing transaction becomes the current transaction and canCommit is called on the cohort. Otherwise, the cohort entry is queued to be processed after the current tranaction completes. - On CommitTransaction message, if the transaction ID passed via the Commit message matches the currently cached cohort entry, the preCommit and commit phases are performed. When complete, the ShardCommitCoordinator dequeues the next waiting transaction cohort entry, if any, and process it. If a Tx is aborted and it is the current transaction, the ShardCommitCoordinator handles it as a completed Tx. Implemented a timeout mechanism using the akka scheduler such that if the commit message isn't received after a period of time (default 30 s) after the canCommit message, the transaction is aborted so that the next transaction can proceed. This is to handle remote node or network failures during a 3-phrase commit. The ThreePhaseCommitCohort actor was removed along with the ForwardedCommitTransaction. Change-Id: Iaa5692ca45cd7635d1a06a609f4bf98bec50df14 Signed-off-by: tpantelis <tpanteli@brocade.com>
Bug-1607: Clustering : Remove actorFor (deprecated) call from TransactionProxy.java 1. Changes to use ActorSelection instead of ActorPath 2. Removed actorFor usage 3. Removed resolvePath calls 4. Changes in test classes. Rebased. Change-Id: I8fdc18a1ac18d75d1b6d5c8cd0da986e19d08280 Signed-off-by: Kamal Rameshan <kramesha@cisco.com>
BUG 1712 - Distributed DataStore does not work properly with Transaction Chains The fix is as follows, 1. When Creating a trasaction chain create a unique identifier for the transaction chain using the member name and the current timestamp 2. When a transaction is created using the transaction chain pass the transaction chain id along to the remote shard 3. If the remote shard receives a transaction with a valid transaction chain (one which is not empty) then it creates a new transaction chain if one does not exist. If one does exist then the Shard just creates a new transaction on the existing transaction chain. This way if a single transaction chai was used to create transactions on multiple different shards the a transaction chain would be created on each one of those shards. 4. When a transaction chain is closed a Close Transaction Chain message is broadcast to all the Shards in the system. If those shards had a transaction chain with the specified id then the transaction chain would be closed. The sender does not care about receiving a response 5. When a state change occurs on a Shard we check if the Shard is not a leader. If that is the case we automatically close all existing transaction chains on that shard and clear the map which tracks the transaction chains for that shard Change-Id: I6bcfb9de3d0ec666e4152afb69c702dda4f38171 Signed-off-by: Moiz Raja <moraja@cisco.com>
Implement creating and applying of snapshot for a shard This commit implements creating and applying of snapshot as per the RaftActor contract. There was an issue related to recovery which was occurring because Shard was created without a schemaContext so there are changes in this commit which ensure that the Shard is not created before a schemaContext is received Change-Id: I45fd64885f09fac57f1f5ff235144064b94ab129 Signed-off-by: Moiz Raja <moraja@cisco.com>
Include JMX Counters and resetTransactionCounters added ReadFailedTransactionsCount added AbortTransactionsCount and resetTransactionCounters Note: Could not get the details where InMemoryDatastore counters are -- need to reset those counters as part of separate commit included copyright headers in files that didn't had patch 4: Review comments closed patch 3: removed a comment that is not needed. Change-Id: I3b7b81bd6c28d2fb766947df8607e8c824445bb0 Signed-off-by: Basheeruddin Ahmed <syedbahm@cisco.com>
Bug 1598: Cleanup stale ShardReadTransactions For read-only Tx, a FinalizablePhantomReference is created with the TransacionProxy instance as the referent and is added to a static FinalizableReferenceQueue. The FinalizablePhantomReference is subclassed to hold the TransactionProxy's remoteTransactionPaths map. When the TransactionProxy instance in GC'ed, the FinalizablePhantomReference is notified and calls closeTransaction on each TransactionContext to clean it up. Also to handle potentially stale Tx's due to disconnects, nodes prematurely shutting down etc, I set set an idle time out (default 10 min) on ShardTransaction actors via setReceiveTimeout. If the actor is idle with no messages after the timeout, the actor self-destructs. I made the idle timeout configurable via the config yang file. This setting needs to be passed down from the module, thru the ShardManager to the Shards to the ShardTransactions. I created a ShardCOntext class to hold the idle timeout and other data that's passed down. This will also make it easier in the future if additonal config data needs to be passed down. In the config yang file, I created a data-store-properties grouping to avoid having to duplicate the config properties for the operational and config data stores. Many unit tests fail when running from Eclipse due to creating anonymous inner Creator class instances in the static props methods. The akka code doesn't like this - if the Creator instance class is enclosed in another class it expects the class to be static. However this runs fine when running unit tests from mvn and when running the production controller. I suspect the difference is b/c the JDK compiler generates the anonymous class as static since they're enclosed in static methods but the Eclipse compiler doesn't. Anyway, to avoid this I refactored all the anonymous inner Creator classes to private static classes. I think this is safer and will avoid potential future issues with different JDK compilers or JDK upgrades. Change-Id: Ie644612cb34e7219dc089b8add6d397a11bffdda Signed-off-by: tpantelis <tpanteli@brocade.com>
Bug 1430: Obtain config params from config system This is a follow-up patch to obtain the various data store executor config params from the config system intsead of system properties. Change-Id: Ib7fa03f053d6165fdcb52300be9add8ebe80b2c2 Signed-off-by: tpantelis <tpanteli@brocade.com>
Optimizations, Monitoring and Logging - Made identifiers type-safe - Changed logging so that most of it is done at debug level instead of info - Make cluster logging configurable via logback.xml - Add Bean for looking at the local shards on shard manager - Add Preconditions in a places where it matters Change-Id: Ie2b17e89bd88edde1366d0a4d23abf9fb97e6e55 Signed-off-by: Moiz Raja <moraja@cisco.com>
Tune replication and stabilize tests Made following changes for replication - Increased Heartbeat timeout to 500 milliseconds. - Send only one entry from the replicated log to the follower in append entries Both of these tweaks have been made to prevent election timeouts and frequent switching of leaders Changes to tests - Added a duration when constructing an ExpectMsg. This prevents ExpectMsg from waiting forever when and expected event does not occur - Removed all Thread.sleep from the tests and replace them with waiting for a specific LogEvent this is a more deterministic. Change-Id: Ie9ce0c9c73bf1b170a78879b1e2dab76f1de64df Signed-off-by: Moiz Raja <moraja@cisco.com>
Utilize transaction type to create read-only or write-only or read-write transaction instead of always creating read-write transaction patch 2: updated based on review comments Change-Id: I36390ab348c5774cf4bf180ea1608fe04f75d073 Signed-off-by: Basheeruddin Ahmed <syedbahm@cisco.com>
Add replication capability to Shard This commit integrates the distributed data store with our Raft implementation. Shard now extends RaftActor which provides it the replication capabilities required. Other notable changes are, - The FindPrimary algorithm has been changed to find the first replica for a shard. The shard then forwards requests to create a transaction or transaction chain to the leader - Changed the package name for Raft internal messages from "internal" to "base" to be more BND tool friendly - Fix some issues with Serialization of Raft messages - Create a NoOpTransaction when no Primary can be found. The commit for this transaction will always fail. The NoOpTransaction returns absent for reads in all cases. - Add PeerAddressResolution capability to Raft. What this basically does is given a static configuration where a shard has 'n' peers, you can pass the names of those peers to the shard and resolve their addresses at a later time. This allows the Shard to ensure consensus even in a situation where it is the first one to come up but it's peers are still not running Change-Id: I3087deb5eb4418cd629a707ba14f43858db1f463 Signed-off-by: Moiz Raja <moraja@cisco.com>
Serialization/Deserialization and a host of other fixes - Hande Cluster MemberUp and MemberRemoved events in ShardManager - Cohort messages and close listener messages switched to use protobuff - Distributed Datastore switch messages to use protobuff CreateTransaction CreateTransactionReply CreateTransactionChain CreateTransactionChainReply distributed datastore messages switched to protobuff - ShardManager messages switch to protobuff - DataChanged and other messages switch to protobuf in distributed datastore - Fixed few things found during testing 1. ShardStrategy - setting of configuration 2. NodeToNormalizedNodeBuilder - leaf node/leafsetentry node checks 3. DataChanged event - passing of scope instanceidentifier used during deserialization - Introducing JMX MBeans for distributed datastore -Fixed issues which were preventing remote Shards from talking to each other - Fixed a number of issues related to deserialization - Add distributed datastore to the build - Switch from InstanceIdentifier to YangInstanceIdentifier Change-Id: I0d15dc482cb2b0fb2170b1344bad9fa3b421e8e0 Signed-off-by: Moiz Raja <moraja@cisco.com>
Make CompositeModification serializable using protocol buffers Change-Id: I3e91452b0244c6adec84c000e83d7f993b2a59b7 Signed-off-by: Moiz Raja <moraja@cisco.com>
Switch to using protocol buffer serialization for the WriteData message - This commit also fixes an issues with the NodeToNormalizedNodeBuilder where an empty container node was decode with a leaf node within - MergeData and WriteData are passed in their serialized forms from the TransactionProxy as well Change-Id: I22eab6059becd427a9f0fae1a9273c8c4e293ee5 Signed-off-by: Moiz Raja <moraja@cisco.com>
Store schemaContext in ActorContext so that all proxy objects can have access to it Use it for MergeData and WriteData message construction Change-Id: I20df92fc77c41016df7cc6f737226368b25b5a0f Signed-off-by: Moiz Raja <moraja@cisco.com>
Pattern for switching from POJO messages to Protocol Buffer messages Change-Id: I053581bc66cdd2627132af7366e01d8276a7c27e Signed-off-by: Moiz Raja <moraja@cisco.com>
NormalizedNode serialization using protocol buffer Utilization of CreateTransactionReply protocol buffer message in Distributed Datastore Subsequent commits will utilize other portocol buffer messages in Distributed Datastore Change-Id: I64c08d1998bab29a92351fc1fd5897d0faaf4081 Signed-off-by: Basheeruddin Ahmed <syedbahm@cisco.com> Signed-off-by: Moiz Raja <moraja@cisco.com>
Enhancements to actor naming, logging and monitoring - Actor names have now been changed to be more meaningful. This will be helpful when trying to follow the logging. - Added logging for when the actor is created and when it is terminated Change-Id: I825270779ce19c319807c5a3c56d4885f8cc0996 Signed-off-by: Moiz Raja <moraja@cisco.com>
Kill Dynamic Actors when we're done with them Kill ShardTransaction on close and on ThreePhaseCommitCohort#commit Kill ThreePhaseCommitCohort on commit Change-Id: Ie86b66cf3841baa514d82509fbc5b817eb7c6740 Signed-off-by: Moiz Raja <moraja@cisco.com>
Implement commiting of data - Implement ThreePhaseCommitCohort Actor - Implement a BasicIntegrationTest to test out using a Shard upto committing - Make modifications in Shard, ShardTransaction, ShardTransactionChain to make the flow work Change-Id: I4eff32833c09d89f81753db29ea38ac26b9dfbf6 Signed-off-by: Moiz Raja <moraja@cisco.com>