Defer up to maxEntrySize flushes by default Batching flushes has performance benefits in terms of throughput. Keeping the flush size at maxEntrySize makes the flush times more consistent, as the journal's syncs will be amortized to fit writing maxEntrySize'd entries. JIRA: CONTROLLER-2108 Change-Id: Ie385738f65d9503fdeeed6a9e0b5ced37fde7fd3 Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Allow segmented journal to flush periodically Flushes to disk end up dominating our use of disk resources, as we issue a flush after each write. This is not entirely efficient, as we may have multiple outstanding writes in the actor queue -- and we ignore the batching opportunity. This patch makes it possible to configure an upper bound of the number of outstanding bytes written which can remainin unflushed. We flush whenever we reach this watermark or when we flush all messages that have been submitted at the time the flush batch has been started. JIRA: CONTROLLER-2108 Change-Id: I6f18de7871c89b5feffecc71580e1f440024f2a3 Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Remove ask-based protocol client Remove the use-tell-based-protocol knob, effectively making it always true. This means DistributedDataStore cannot be instantiated, which leads to its removal. That in turn makes all of DatastoreContext, OperationLimiter and similar classes superfluous, so we remove those as well. A few classes are used to drive the shard backend in integration tests, and hence those are moved to test sources. JIRA: CONTROLLER-2054 Change-Id: Ie20b1c898576d3c89b70b34121310e58faddbf8e Signed-off-by: bentom-binoy <bentom.binoy@infosys.com> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Obsolete ask-based protocol Mark the switch to use ask-based protocol obsolete and deprecate all classes implementing it. JIRA: CONTROLLER-2053 Change-Id: Ib0f5d6a946090addde255423d51746a52e785b2a Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Enable Split-Brain Resolver In order to make EOS service work, we need to have a provider handling dead nodes. Enable SBR to fill that role, so that we properly recover from nodes going away. JIRA: CONTROLLER-2025 Change-Id: Idf817455bfe2a90d6e02011eee4ed407e1254fd2 Signed-off-by: Tomas Cere <tomas.cere@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Switch to using tell-based protocol Using Artery with ask-based protocol in our current settings leads to easily running out of native memory. Switch to tell-based protocol, which can slice its messages to fit in smaller buffers. JIRA: CONTROLLER-1983 Change-Id: I0a296dbb3ba6e4e659c94761d78cfb985633061c Signed-off-by: Ivan Hrasko <ivan.hrasko@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Tune eos gossip/notification intervals Looks like eos gossips take much too long, so lets lower these so we have faster response times all around. JIRA: CONTROLLER-2004 Change-Id: I3daf8d207a6b51b16e6b8cb3f7dcefd55e6626cf Signed-off-by: Tomas Cere <tomas.cere@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Remove obsolete datastore.cfg properties We have a number of leftovers here, which have been no-ops. Remove them to reduce code clutter. JIRA: CONTROLLER-1984 Change-Id: I490188fb7ebc83c344997861d637852f40fce7a6 Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Snapshot and journal export on recovery Added ability to export snapshot and journal content into json file during recovery. JIRA: CONTROLLER-1955 Change-Id: Ic2d6181ab56d7b413f06ed91cf5f9d37e3aa2029 Signed-off-by: tadei.bilan <tadei.bilan@pantheon.tech> Signed-off-by: Oleksii Mozghovyi <oleksii.mozghovyi@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Use correct failure detector for akka clustering Since we are not using classic remoting, we need to use proper tuning for clustering's failure-detector. Correct the template to reflect that. Change-Id: I4bff994b786237778df5bdfb83df858b00b549ed Signed-off-by: Oleksii Mozghovyi <oleksii.mozghovyi@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Use memory-mapped segmented journal by default Segmented file implementation requires a buffer double the size of the file's maximum size. This ends up being allocated on-healp, which is wasteful from effeciency perspective. Switch default configuration so that it uses memory-mapped files instead. JIRA: CONTROLLER-1954 Change-Id: Icad9ef74c50467323e31567828a949c0cae52a9e Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Switch to Akka Artery The migration away from legacy akka remoting to artery tcp. JIRA: CONTROLLER-1968 Change-Id: Iac1a0186292eb5a303cf075e540f3f6c8c09a932 Signed-off-by: Kostiantyn Nosach <kostiantyn.nosach@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Bump akka to 2.6.12 Release notes: https://akka.io/blog/news/2019/11/06/akka-2.6.0-released https://akka.io/blog/news/2019/12/06/akka-2.6.1-released https://akka.io/blog/news/2020/01/27/akka-2.6.2-released https://akka.io/blog/news/2020/01/28/akka-2.6.3-released https://akka.io/blog/news/2020/03/13/akka-2.6.4-released https://akka.io/blog/news/2020/04/30/akka-2.6.5-released https://akka.io/blog/news/2020/06/08/akka-2.6.6-released-split-brain-resolver https://akka.io/blog/news/2020/07/10/akka-2.6.7-released https://akka.io/blog/news/2020/07/16/akka-2.6.8-released https://akka.io/blog/news/2020/09/09/akka-2.6.9-released https://akka.io/blog/news/2020/10/09/akka-2.6.10-released https://akka.io/blog/news/2021/01/15/akka-2.6.11-released https://akka.io/blog/news/2021/01/28/akka-2.6.12-released JIRA: CONTROLLER-1962 Change-Id: Ibbfc11a8ca27a8c09337bf49de910c38a9239886 Signed-off-by: tadei.bilan <tadei.bilan@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech> Signed-off-by: Oleksii Mozghovyi <oleksii.mozghovyi@pantheon.tech>
Add multi journal configuration for segmented journal We dont need to have large segments for operational shards. Add in multi journal configuration that gets used when shard has persistence turned off. JIRA: CONTROLLER-1938 Change-Id: I39349503079ef03177c8b9b52909078c5f35d6ba Signed-off-by: Tomas Cere <tomas.cere@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Add direct in-memory journal threshold Some deployments benefit from placing an absolute numeric limit on the retained memory. Introduce a new tunable, which overrides the usual percentange limit. JIRA: CONTROLLER-1956 Change-Id: I688e226b173386765bea74931b6aaf617bda30a8 Signed-off-by: tadei.bilan <tadei.bilan@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Add optional lz4 compression for snapshots Added ability to use lz4 compression both for snapshots sent to followers and snapshots in storage. JIRA: CONTROLLER-1936 Change-Id: I073120efddde869b10999450057b91e75f0ffe07 Signed-off-by: tadei.bilan <tadei.bilan@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Add an option to trigger snapshot creation on root overwrites In some cases (such as DAEXIM import), it does not necessarily make sense to retain previous data in the journal, as all of it has been superseded. JIRA: CONTROLLER-1913 Change-Id: I5d634faac06e6764a417c23e88c728373b900924 Signed-off-by: Tibor Král <tibor.kral@pantheon.tech> Signed-off-by: Tomas Cere <tomas.cere@pantheon.tech>
Add option to disable default ActorSystemQuarantinedEvent handling The default reaction to ThisActorSystemQuarantinedEvent is to restart the entire Karaf container. However some users may want to process the event differently. JIRA: CONTROLLER-1949 Change-Id: Id65d31749dd97cb067611f7cfe4df76a6fe12204 Signed-off-by: Tibor Král <tibor.kral@pantheon.tech> Signed-off-by: Robert Varga <robert.varga@pantheon.tech>
Allow incremental recovery Expose configuration knob in DatastoreContext to specify the amount of recovered journal entries after which a Snapshot should be taken and the journal purged. JIRA: CONTROLLER-1915 Change-Id: I4b20a0abe0329965ca5ac1ab5df7d9ca8480cfb2 Signed-off-by: Tibor Král <tibor.kral@pantheon.tech>
Remove use of InMemoryDOMDataStoreConfigProperties CDS does not use original IMDS tuning, as we have not used a captive IMDS instance for almost five years now. Remove references InMemoryDOMDataStoreConfigProperties and turn the knobs to no-ops. JIRA: CONTROLLER-1940 Change-Id: I527b8ba407f7de4ecadb82f7c24d6a782722f683 Signed-off-by: Robert Varga <robert.varga@pantheon.tech>