Make FollowerLogInformation a class This really is a mutable state holder, there is no good reason why it should be split into an interface and an implementation. Change-Id: I63a080a4f04bfa4baf1eb291a06140605a50762c Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Fix checkstyle reported by odlparent-3.0.0 Change-Id: I08c548fbbbef8527ad7b037b0def33d3c1c09bf6 Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Bug 2890: Chunk AppendEntries when single payload size exceeds threshold Utilizes the MessageSlicer in AbstractLeader to slice/chunk AppendEntries messages whose single log entry payloas exceeds the max size threshold. The MessageAssembler is used in the Follower to re-assemble. For efficiency, with multiple followers, the AbstractLeader reuses the FileBackedOutputStream containing the serialized AppendEntries data. However, since the MessageSlicer takes ownership of the FileBackedOutputStream and cleans it up when slicing is complete, I added a SharedFileBackedOutputStream class that maintains a usage count and performs cleanup when the usage count reaches 0. The AbstractLeader maintains a Map of SharedFileBackedOutputStream instances keyed by log index. The FollowerLogInformation keeps track of whether or not slicing is in progress for the follower. Same as with install snapshot, we only want to send empty AppendEntries as heartbeats. Change-Id: Id163944b9989f6cb39a6aaaa98d1f3c4b0026bbe Signed-off-by: Tom Pantelis <tompantelis@gmail.com>
Bug 8206: Prevent decr follower next index beyong -1 If a follower's next index is already -1, we shouldn't decrement it further, ie -1 is the lowest allowed value. This can result in AbstractLeader continuously decrementing and logging an info message while in the process of sending an install snapshot. member-3-shard-default-config (Leader): follower member-1-shard-default-config last log term 2 conflicts with the leader's 3 - dec next index to -2 Modified decrNextIndex to return a boolean if next index was decremented which is checked by AbstractLeader. Change-Id: I29454d4e71a7f9128b3b47f6a4e3403615c2c8d2 Signed-off-by: Tom Pantelis <tompantelis@gmail.com> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Bug 7521: Convert install snapshot chunking to use streams On the leader side, converted LeaderInstallSnapshotState to use the InputStream from the ByteSource instead of a ByteString to chunk the data. On the follower side, converted the SnapshotTracker, which is used to reassemble the install snapshot chunks, to write the chunks to an OutputStream instead of a ByteString. Currently a ByteArrayOutputStream is used by will be changed to a FileBackedOutputStream in a subsequent patch. Change-Id: I7a16ad5d44a530e260aa332d91145fbc3fb95f5f Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Fix remaining CS errors in sal-akka-raft and enable enforcement Some checkstyle violations were missed in previous patches which enabling enforcement revealed. Change-Id: I3a31b24aea69adfe8d50071fdce27fbd69c04b58 Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Fix warnings and javadocs in sal-akka-raft Fixed a lot of checkstyle warnings and cleaned up javadocs for classes in the org.opendaylight.controller.cluster.raft package. Change-Id: I67dd997701fe6eaf6c87e77954a4c1d4aa5fda69 Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Bug 6540: Fix journal issues on leader changes Fixed a couple issues with journal syncing on leader changes and isolation. Consider the scenario where a leader is isolated and the majority partition elects a new leader and both sides of the partition attempt to commit entries independently. Say the term was 1 and last journal index was 2 prior to isolation and was replicated to all followers and applied to state. After isolation, the isolated leader appends a new entry with index 3 and attempts to replicate but fails to reach consensus. Meanwhile, the new leader appends its own new entry with index 3 and is successfully replicated to the remaining follower and applied to state. The commitIndex in the majority partition is now 3. The new leader attempts to send AppendEntries to the isolated leader but doesn't get any replies so it marks it as inactive. When the partition is healed, the isolated leader converts to follower when it hears from the new leader with the higher term. Since the new leader has marked the isolated leader as inactive, the initial AppendEntries that the previous leader sees will have no entries and the leaderCommitIndex will be 3. This is greater than the current commitIndex 2 so the previous leader will update its commitIndex to 3 and apply its entry with index 3 to the state. However this entry was from the previous term 1 which was not replicated to a majority of the nodes and conflicts with the new leader's entry with index 3 and term 2. This is a violation of raft. This violation occurs as a result of the new leader not sending any entries until it knows the follower is active. This is for efficiency to avoid continuously trying to send entries when a follower is down. This is fine however the leader should not send its current commit index either since it doesn't know the state of the follower. The intention of the empty AppendEntries in this case is to re-establish connectivity with the follower and thus should not cause any state change in the follower. Therefore I changed the code to send leaderCommitIndex as -1 if the follower is inactive. The other case where the leader purposely sends an empty AppendEntries is when the leader is in the process of installing a snapshot on a follower, as indicated by the presence of a LeaderInstallSnapshotState instance in the FollowerLogInformation. The empty AppendEntries is still sent at the heartbeat interval to prevent an election timeout in case the snapshot capture/transfer is delayed. Again, the AppendEntries should not cause any state change in the follower so I also changed the leader to send -1 for the leaderCommitIndex. As a result, I also changed it so that the leader immeditely records a LeaderInstallSnapshotState instance in the FollowerLogInformation when it initiates the async snapshot capture. Previously this was done when the capture completed and the RaftActor sent the SendInstallSnapshot message to the leader behavior. However it may take some time to capture the snapshot and intervening AppendEntries heart beats may be sent to the follower. The other issue in the above scenario is that the conflict with entry 3 is not immediately detected. On the first AppendEntries, the previous leader reports back a successful reply with lastLogIndex 3 and lastLogTerm 1 b/c the previous index (2) and term (1) didn't conflict. The new leader sets the previous leader's match index to 3 and thinks index 3 has been replicated to all the followers and trims its in-memory log at index 2. Eventually when the next entry with index 4 is replicated, the previous leader will detect the conflict as the leader's previous log index 3 and term 2 will be sent in the next AppendEntries. The new leader will backtrack and eventually install a snapshot to sync the previous leader however it's inefficient and should be unnecessary. The leader should detect the conflict immediately on the first AppendEntries reply. So I changed handleAppendEntriesReply to check that the follower's lastLogTerm matches the leader's term for that index. If not, the leader sets the follower's next index to lastLogTerm - 1. This prevents the leader from trimming its log and the next AppendEntries will include the conflicting entry which the follower will remove/replace. Change-Id: I7a0282cc4078f33ffd049e4a0eb4feff6230510d Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Bug 6540: Move LeaderInstallSnapshotState to FollowerLogInformation AbstractLeader maintains a Map of followerId -> LeaderInstallSnapshotState in parallel to the Map of followerId -> FollowerLogInformation. It makes sense to move the LeaderInstallSnapshotState into the FollowerLogInformation instead of maintaining 2 Maps. Change-Id: Ia0b58fad9bb2fde42d8c1ba4b0f7aae4eb11abb5 Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Add raftVersion field to AppendEntriesReply Added a raftVersion field to AppendEntriesReply and modified AbstractLeader to store the raftVersion in the FollowerLogInformation. This will enable sending the appropriate serialized version for subsequent messages sent to the follower. The raftVersion in the FollowerLogInformation is initialized to 0 (HELIUM_VERSION) so we assume the oldest version for backwards compatibility until we get the first AppendEntriesReply with the follower's actual version. Since we're adding a new field to AppendEntriesReply this won't break backwards compatibility as Java serialization handles that. The raftVersion will be set to 0 if sent from a pre-boron version. Change-Id: I4519c4f314674840f2578848b2888c3e8467dd21 Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Introduce PeerInfo and VotingState We need to store the voting state for each per so I created a PeerInfo class to include, id, address and voting state (represented by a VotingState enum). The RaftActorContext now stores PeerInfo instances in its peer map and added methods to access PeerInfo. As a consequence, RaftActorContext#getPeerAddresses was no longer needed and was removed. AbstractLeader and Candidate were modified to utilize the PeerInfo to calculate the majority vote/min replication count, ie ignore non-voting peers. Previously we had added a FollowerState enum and stored it in the FollowerLogInformation. Since voting state is now stored in the RaftActorContext peer info, I removed the FollowerState from FollowerLogInformation to avoid redundancy and having to keep both up to date. Change-Id: I1394511a8db7f0b9df3ed7879c77c1f44f3b143d Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Initial code for RaftActorServerConfigurationSupport Added a RaftActorServerConfigurationSupport and unit test class with mostly initial skeleton code. In RaftActorServerConfigurationSupport, I implemented the basic checks for leader avaialbility with corresponding unit tests. If not the leader and there is a leader, it forwards to the remote leader. If no leader, it returns NO_LEADER failure. Also in RaftActorServerConfigurationSupport, I added code for the first steps: add the serverId/address into the RaftActorContext peer map and add a FollowerLogInformation entry in the AbstractLeader. I added an initialized field wih getters/setters to FollowerLogInformation. The entry is added with initialized set to false. I also changed the followerToLogMap in AbstractLeader to mmutable. I also modified FollowerLogInformationImpl so it returns false for isFollowerActive and isOkToReplicate if initialized is false. The idea is to prevent the leader from sending log entries or a snapshot via the heartbeat or replication. The leader will send an empty AppendEntries heartbeat which should be fine. The RaftActorServerConfigurationSupport will initiate the install snapshot directly. I added TODO comments in RaftActorServerConfigurationSupport and the unit test class which outline the remaining work. I also added the ServerConfigurationPayload class to be used for the log entries. Change-Id: Ic11ddc99a57edb7ef70c2d4f5fa7906d6a95b35e Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Bug 3020: Add version to AppendEntries and AppendEntriesReply To handle backwards compatibility, a leader needs to know the version of its followers wrt to the derived RaftActor's payload data. This will enable the derived RaftActor to translate its payload data to an older version. So I added a version field to AppendEntriesReply which the leader stores in the FollowerLogInformation. In addition, a follower needs to know the version of its leader so we can handle backwards compatibility wrt to transactions since we no longer send the CreateTransaction message to the leader (currently only for write-only). This patch adds a version field to AppendEntries - a subsequent patch will utilize it. Change-Id: I41632024a270206153e7c5d363ee1c79800e4200 Signed-off-by: Tom Pantelis <tpanteli@brocade.com> (cherry picked from commit 655216a6c75aa29d31c4c56c56a5000db56ba233)
Make FollowerLogInformationImpl fields non-volatile The FollowerLogInformationImpl fields nextIndex and matchIndex don't need to be volatile as they're only accessed via the RaftActor and akka's happens-before rule guarantees actor state visibility when a subsequent message is processed. Change-Id: Ibee0cc8d04c0d65b2f512e44398474439363a00e Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
BUG 2849 : Reduce sending of duplicate replication messages With Raft not all append entries are expected to reach the followers so it is a requirement that when acknowledgement of an append entries is not received we send a duplicate append entries. In replication scenarios where a lot of rapid replication messages are being generated the lack of equally rapid acknowledgement results in a lot of duplicate messages being generated. This later results in a cycle when the replies for those duplicate replication messages start arriving and result in even more duplicate replication messages being sent. This patch addresses this problem by disallowing a replication message to be generated for a follower is the nextIndex for that follower has not changed or the time since the last replication message exceeds the heartbeat interval. This allows us to reduce duplicates but still allows duplicates to get generated for legitimate reasons. Change-Id: I0918672468f4fc1da935c45c7b46c750cde619f4 Signed-off-by: Moiz Raja <moraja@cisco.com>
Reduce/enhance logging in AbstractLeader If you turn on debug logging for Shard, the "Checking sendAppendEntries for follower..." message is logged by sendUpdatesToFollower on every AppendEntriesReply heartbeat even when no data was sent to the follower. This bombards the karaf.log. To avoid this, we need this only logged if something changed for the follower. So, I changed the FollwerLogInformation setMatchIndex and setNextIndex methods to return a boolean if changed which handleAppendEntriesReply checks to see if either were updated. If neither changed, it passes true to sendUpdatesToFollower for isHeartbeat so the message isn't logged. I also made a few other minor logging changes. Change-Id: Ib1f2109092b3656c3cc3dfdd2cd7b4641a3f890b Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Dynamically update DatastoreContext properties Added a ConfigurationListener to the DatastoreContextConfigAdminOverlay that is registered as an OSGi service to receive ConfigurationEvents from the Config Admin when the org.opendaylight.controller.cluster.datastore.cfg file is modified. The DistributedDataStore registers as a listener to the DatastoreContextConfigAdminOverlay to get updated when the DatastoreContext instance changes due to a ConfigurationEvent. The DistributedDataStore sets the new DatastoreContext instance in the ActorContext which updates its cached instance and properties and sends the DistributedDataStore instance as a message to the ShardManager. The ShardManager simply sends it out to all the Shards which update their cached DatastoreContext state. The Shard also updates the new ConfigParams instance in the base RaftActor which sets it in its RaftActorContext. There was one place in FollowerLogInformationImpl where it had a local cache of a property obtained from the ConfigParams so I changed the ctor to pass the RaftActorContext instead and obtain the property each time from the ConfigParams. A slight downside to this is that we'll have to be cognizant of not caching DatastoreContext/ConfigParams properties and assuming they're static. Or if we do cache we need to handle updates. I toyed around with trying to restart the DistributedDataStore with a new DatastoreContext and ActorContext and killing the previous ShardManager and Shard actors and creating new ones. However recreating the Shards without being disruptive to clients is tricky and risky, eg handling infight transactions, and I didn't see a clean way to do it without possibly causing inflight transactions to fail. .So I went with the simpler approach of just pushing an updated DatastoreContext to the actors. Change-Id: Ie608f61da36ac58a806208925a3c4277968c2f5b Signed-off-by: Tom Pantelis <tpanteli@brocade.com>
Refactor LeaderTest Removed the remaining Within blocks in tests. Changed each test to use the leaderActor and followerActor class members for consistency. The actors are killed on tearDown. The creation of a JavaTestKit block is no longer necessary and was removed from each test. Added a shorcut method createActorContextWithFollower that creates the leader context and adds a peer follower. This eliminates some duplicate code in the tests. Added an expectFirstMatching method to MessageCollector that waits for the message in contrast to getFirstMatching. Added some more assertions in some tests. Change-Id: Id4ee0291352999b31e40abd9895e3d0237acf432 Signed-off-by: tpantelis <tpanteli@brocade.com>
BUG-1173: ensure compatibility with Guava 17+ This migrates the sole user to the Guava-15+ API, as the API is removed in version 17. Change-Id: I3bf52fb89116024e7305a142e6b1f6fec9985488 Signed-off-by: Robert Varga <rovarga@cisco.com>
Optimized AppendEntries sending to follower by adding AppendEntry call at end of reply, if needed. This will help to reduce the lag of a follower and leader does not have to wait for heartbeat time to send a message to follower. This way, we can tune heartbeat interval in better manner. Change-Id: I0259ea3691a3ac1245e36afe15b06ed44f377466 Signed-off-by: Harman Singh <harmasin@cisco.com>