that it can rejoin the cluster. Once a restarted node joins a cluster, it
will synchronize with the lead node automatically.
+.. _getting-started-clustering-scripts:
+
+Clustering Scripts
+------------------
+
+OpenDaylight includes some scripts to help with the clustering configuration.
+
+.. note::
+
+ Scripts are stored in the OpenDaylight distribution/bin folder, and
+ maintained in the distribution project
+ `repository <https://git.opendaylight.org/gerrit/p/integration/distribution>`_
+ in the folder distribution-karaf/src/main/assembly/bin/.
+
+Configure Cluster Script
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This script is used to configure the cluster parameters (e.g. akka.conf,
+module-shards.conf) on a member of the controller cluster. The user should
+restart the node to apply the changes.
+
+.. note::
+
+ The script can be used at any time, even before the controller is started
+ for the first time.
+
+Usage::
+
+ bin/configure_cluster.sh <index> <seed_nodes_list>
+
+* index: Integer within 1..N, where N is the number of seed nodes. This indicates
+ which controller node (1..N) is configured by the script.
+* seed_nodes_list: List of seed nodes (IP address), separated by comma or space.
+
+The IP address at the provided index should belong to the member executing
+the script. When running this script on multiple seed nodes, keep the
+seed_node_list the same, and vary the index from 1 through N.
+
+Optionally, shards can be configured in a more granular way by modifying the
+file "custom_shard_configs.txt" in the same folder as this tool. Please see
+that file for more details.
+
+Example::
+
+ bin/configure_cluster.sh 2 192.168.0.1 192.168.0.2 192.168.0.3
+
+The above command will configure the member 2 (IP address 192.168.0.2) of a
+cluster made of 192.168.0.1 192.168.0.2 192.168.0.3.
+
Setting Up a Multiple Node Cluster
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
address of any of the machines that will be part of the cluster::
cluster {
- seed-nodes = ["akka.tcp://opendaylight-cluster-data@127.0.0.1:2550"]
+ seed-nodes = ["akka.tcp://opendaylight-cluster-data@${IP_OF_MEMBER1}:2550",
+ <url-to-cluster-member-2>,
+ <url-to-cluster-member-3>]
#. Find the following section and specify the role for each member node. Here
we assign the first node with the *member-1* role, the second node with the
}
cluster {
- seed-nodes = ["akka.tcp://opendaylight-cluster-data@10.194.189.96:2550"]
+ seed-nodes = ["akka.tcp://opendaylight-cluster-data@10.194.189.96:2550",
+ "akka.tcp://opendaylight-cluster-data@10.194.189.98:2550",
+ "akka.tcp://opendaylight-cluster-data@10.194.189.101:2550"]
auto-down-unreachable-after = 10s
roles = [
- "member-1"
+ "member-2"
]
}
}
]
-Clustering Scripts
-------------------
-
-OpenDaylight includes some scripts to help with the clustering configuration.
-
-.. note::
-
- Scripts are stored in the OpenDaylight distribution/bin folder, and
- maintained in the distribution project
- `repository <https://git.opendaylight.org/gerrit/p/integration/distribution>`_
- in the folder distribution-karaf/src/main/assembly/bin/.
-
-Configure Cluster Script
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-This script is used to configure the cluster parameters (e.g. akka.conf,
-module-shards.conf) on a member of the controller cluster. The user should
-restart the node to apply the changes.
-
-.. note::
-
- The script can be used at any time, even before the controller is started
- for the first time.
-
-Usage::
-
- bin/configure_cluster.sh <index> <seed_nodes_list>
-
-* index: Integer within 1..N, where N is the number of seed nodes. This indicates
- which controller node (1..N) is configured by the script.
-* seed_nodes_list: List of seed nodes (IP address), separated by comma or space.
-
-The IP address at the provided index should belong to the member executing
-the script. When running this script on multiple seed nodes, keep the
-seed_node_list the same, and vary the index from 1 through N.
-
-Optionally, shards can be configured in a more granular way by modifying the
-file "custom_shard_configs.txt" in the same folder as this tool. Please see
-that file for more details.
-
-Example::
-
- bin/configure_cluster.sh 2 192.168.0.1 192.168.0.2 192.168.0.3
-
-The above command will configure the member 2 (IP address 192.168.0.2) of a
-cluster made of 192.168.0.1 192.168.0.2 192.168.0.3.
-
-Set Persistence Script
-^^^^^^^^^^^^^^^^^^^^^^
-
-This script is used to enable or disable the config datastore persistence. The
-default state is enabled but there are cases where persistence may not be
-required or even desired. The user should restart the node to apply the changes.
-
-.. note::
-
- The script can be used at any time, even before the controller is started
- for the first time.
-
-Usage::
-
- bin/set_persistence.sh <on/off>
-
-Example::
-
- bin/set_persistence.sh off
-
-The above command will disable the config datastore persistence.
-
Cluster Monitoring
------------------
JConsole, VisualVM, or other JMX clients, or exposed via a REST API using
`Jolokia <https://jolokia.org/features-nb.html>`_, provided by the
``odl-jolokia`` Karaf feature. This is convenient, due to a significant focus
-on REST in OpenDaylight. In case the feature is not available, loading Jolokia
-can be achieved by installing the bundle directly::
-
- bundle:install mvn:org.jolokia/jolokia-osgi/1.3.5
+on REST in OpenDaylight.
The basic URI that lists a schema of all available MBeans, but not their
content itself is::
*systemmetrics* project offers a DLUX based UI to display the same
information.
+.. _cluster_admin_api:
+
Geo-distributed Active/Backup Setup
-----------------------------------
it is recommended to either clean the databases (i.e., ``journal`` and
``snapshots`` directory) on the primary nodes before bringing them back up or
restore them from a recent backup of the secondary site (see section
-:ref:`cluster_backup_restore` below).
+:ref:`cluster_backup_restore`).
If is also possible to gracefully remove a node from a cluster, with the
following RPC::
No input required, but this RPC needs to be sent to the new node, to instruct
it to replicate all shards from the cluster.
-.. _cluster_backup_restore:
-
-Backing Up and Restoring the Datastore
---------------------------------------
-
-The same cluster-admin API that is used above for managing shard voting states
-has an RPC allowing backup of the datastore in a single node, taking only the
-file name as a parameter::
-
- POST /restconf/operations/cluster-admin:backup-datastore
-
-RPC input JSON::
-
- {
- "input": {
- "file-path": "/tmp/datastore_backup"
- }
- }
-
.. note::
- This backup can only be restored if the YANG models of the backed-up data
- are identical in the backup OpenDaylight instance and restore target
- instance.
-
-To restore the backup on the target node the file needs to be placed into the
-``$KARAF_HOME/clustered-datastore-restore`` directory, and then the node
-restarted. If the directory does not exist (which is quite likely if this is a
-first-time restore) it needs to be created. On startup, ODL checks if the
-``journal`` and ``snapshots`` directories in ``$KARAF_HOME`` are empty, and
-only then tries to read the contents of the ``clustered-datastore-restore``
-directory, if it exists. So for a successful restore, those two directories
-should be empty. The backup file name itself does not matter, and the startup
-process will delete it after a successful restore.
-
-The backup is node independent, so when restoring a 3 node cluster, it is best
-to restore it on each node for consistency. For example, if restoring on one
-node only, it can happen that the other two empty nodes form a majority and
-the cluster comes up with no data.
+ While the cluster admin API allows adding and removing shards dynamically,
+ the ``module-shard.conf`` and ``modules.conf`` files are still used on
+ startup to define the initial configuration of shards. Modifications from
+ the use of the API are not stored to those static files, but to the journal.
+
+Extra Configuration Options
+---------------------------
+
+============================================== ================= ======= ==============================================================================================================================================================================
+Name Type Default Description
+============================================== ================= ======= ==============================================================================================================================================================================
+max-shard-data-change-executor-queue-size uint32 (1..max) 1000 The maximum queue size for each shard's data store data change notification executor.
+max-shard-data-change-executor-pool-size uint32 (1..max) 20 The maximum thread pool size for each shard's data store data change notification executor.
+max-shard-data-change-listener-queue-size uint32 (1..max) 1000 The maximum queue size for each shard's data store data change listener.
+max-shard-data-store-executor-queue-size uint32 (1..max) 5000 The maximum queue size for each shard's data store executor.
+shard-transaction-idle-timeout-in-minutes uint32 (1..max) 10 The maximum amount of time a shard transaction can be idle without receiving any messages before it self-destructs.
+shard-snapshot-batch-count uint32 (1..max) 20000 The minimum number of entries to be present in the in-memory journal log before a snapshot is to be taken.
+shard-snapshot-data-threshold-percentage uint8 (1..100) 12 The percentage of Runtime.totalMemory() used by the in-memory journal log before a snapshot is to be taken
+shard-hearbeat-interval-in-millis uint16 (100..max) 500 The interval at which a shard will send a heart beat message to its remote shard.
+operation-timeout-in-seconds uint16 (5..max) 5 The maximum amount of time for akka operations (remote or local) to complete before failing.
+shard-journal-recovery-log-batch-size uint32 (1..max) 5000 The maximum number of journal log entries to batch on recovery for a shard before committing to the data store.
+shard-transaction-commit-timeout-in-seconds uint32 (1..max) 30 The maximum amount of time a shard transaction three-phase commit can be idle without receiving the next messages before it aborts the transaction
+shard-transaction-commit-queue-capacity uint32 (1..max) 20000 The maximum allowed capacity for each shard's transaction commit queue.
+shard-initialization-timeout-in-seconds uint32 (1..max) 300 The maximum amount of time to wait for a shard to initialize from persistence on startup before failing an operation (eg transaction create and change listener registration).
+shard-leader-election-timeout-in-seconds uint32 (1..max) 30 The maximum amount of time to wait for a shard to elect a leader before failing an operation (eg transaction create).
+enable-metric-capture boolean false Enable or disable metric capture.
+bounded-mailbox-capacity uint32 (1..max) 1000 Max queue size that an actor's mailbox can reach
+persistent boolean true Enable or disable data persistence
+shard-isolated-leader-check-interval-in-millis uint32 (1..max) 5000 the interval at which the leader of the shard will check if its majority followers are active and term itself as isolated
+============================================== ================= ======= ==============================================================================================================================================================================
+
+These configuration options are included in the etc/org.opendaylight.controller.cluster.datastore.cfg configuration file.