* If you have a three node cluster and would like to be able to tolerate any
single node crashing, a replica of every defined data shard must be running
- on all three cluster nodes.
+ on all three cluster nodes.
.. note:: This is because OpenDaylight's clustering implementation requires a
majority of the defined shard replicas to be running in order to
Find every instance of the following lines and replace _127.0.0.1_ with the
hostname or IP address of the machine on which this file resides and
OpenDaylight will run::
-
+
netty.tcp {
hostname = "127.0.0.1"
-
+
.. note:: The value you need to specify will be different for each node in the
cluster.
mailbox-type = "org.opendaylight.controller.cluster.common.actor.MeteredBoundedMailbox"
mailbox-capacity = 1000
mailbox-push-timeout-time = 100ms
- }
-
+ }
+
metric-capture-enabled = true
-
+
akka {
loglevel = "DEBUG"
loggers = ["akka.event.slf4j.Slf4jLogger"]
-
+
actor {
-
+
provider = "akka.cluster.ClusterActorRefProvider"
serializers {
java = "akka.serialization.JavaSerializer"
proto = "akka.remote.serialization.ProtobufSerializer"
}
-
+
serialization-bindings {
"com.google.protobuf.Message" = proto
-
+
}
}
remote {
receive-buffer-size = 52428800
}
}
-
+
cluster {
seed-nodes = ["akka.tcp://opendaylight-cluster-data@10.194.189.96:2550"]
-
+
auto-down-unreachable-after = 10s
-
+
roles = [
"member-1"
]
-
+
}
}
}
-
+
odl-cluster-rpc {
bounded-mailbox {
mailbox-type = "org.opendaylight.controller.cluster.common.actor.MeteredBoundedMailbox"
mailbox-capacity = 1000
mailbox-push-timeout-time = 100ms
}
-
+
metric-capture-enabled = true
-
+
akka {
loglevel = "INFO"
loggers = ["akka.event.slf4j.Slf4jLogger"]
-
+
actor {
provider = "akka.cluster.ClusterActorRefProvider"
-
+
}
remote {
log-remote-lifecycle-events = off
port = 2551
}
}
-
+
cluster {
seed-nodes = ["akka.tcp://opendaylight-cluster-rpc@10.194.189.96:2551"]
-
+
auto-down-unreachable-after = 10s
}
}
]
}
]
+
+Clustering Scripts
+------------------
+
+OpenDaylight includes some scripts to help with the clustering configuration.
+
+.. note::
+
+ Scripts are stored in the OpenDaylight distribution/bin folder, and
+ maintained in the distribution project
+ `repository <https://git.opendaylight.org/gerrit/p/integration/distribution>`_
+ in the folder distribution-karaf/src/main/assembly/bin/.
+
+Configure Cluster Script
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This script is used to configure the cluster parameters (e.g. akka.conf,
+module-shards.conf) on a member of the controller cluster. The user should
+restart the node to apply the changes.
+
+.. note::
+
+ The script can be used at any time, even before the controller is started
+ for the first time.
+
+Usage::
+
+ bin/configure_cluster.sh <index> <seed_nodes_list>
+
+* index: Integer within 1..N, where N is the number of seed nodes. This indicates
+ which controller node (1..N) is configured by the script.
+* seed_nodes_list: List of seed nodes (IP address), separated by comma or space.
+
+The IP address at the provided index should belong to the member executing
+the script. When running this script on multiple seed nodes, keep the
+seed_node_list the same, and vary the index from 1 through N.
+
+Optionally, shards can be configured in a more granular way by modifying the
+file "custom_shard_configs.txt" in the same folder as this tool. Please see
+that file for more details.
+
+Example::
+
+ bin/configure_cluster.sh 2 192.168.0.1 192.168.0.2 192.168.0.3
+
+The above command will configure the member 2 (IP address 192.168.0.2) of a
+cluster made of 192.168.0.1 192.168.0.2 192.168.0.3.
+
+Set Persistence Script
+^^^^^^^^^^^^^^^^^^^^^^
+
+This script is used to enable or disable the config datastore persistence. The
+default state is enabled but there are cases where persistence may not be
+required or even desired. The user should restart the node to apply the changes.
+
+.. note::
+
+ The script can be used at any time, even before the controller is started
+ for the first time.
+
+Usage::
+
+ bin/set_persistence.sh <on/off>
+
+Example::
+
+ bin/set_persistence.sh off
+
+The above command will disable the config datastore persistence.
+
+Cluster Monitoring
+------------------
+
+OpenDaylight exposes shard information via MBeans, which can be explored with
+JConsole, VisualVM, or other JMX clients, or exposed via a REST API using
+`Jolokia <https://jolokia.org/features-nb.html>`_, provided by the
+``odl-jolokia`` Karaf feature. This is convenient, due to a significant focus
+on REST in OpenDaylight.
+
+The basic URI that lists a schema of all available MBeans, but not their
+content itself is::
+
+ GET /jolokia/list
+
+To read the information about the shards local to the queried OpenDaylight
+instance use the following REST calls. For the config datastore::
+
+ GET /jolokia/read/org.opendaylight.controller:type=DistributedConfigDatastore,Category=ShardManager,name=shard-manager-config
+
+For the operational datastore::
+
+ GET /jolokia/read/org.opendaylight.controller:type=DistributedOperationalDatastore,Category=ShardManager,name=shard-manager-operational
+
+The output contains information on shards present on the node::
+
+ {
+ "request": {
+ "mbean": "org.opendaylight.controller:Category=ShardManager,name=shard-manager-operational,type=DistributedOperationalDatastore",
+ "type": "read"
+ },
+ "value": {
+ "LocalShards": [
+ "member-1-shard-default-operational",
+ "member-1-shard-entity-ownership-operational",
+ "member-1-shard-topology-operational",
+ "member-1-shard-inventory-operational",
+ "member-1-shard-toaster-operational"
+ ],
+ "SyncStatus": true,
+ "MemberName": "member-1"
+ },
+ "timestamp": 1483738005,
+ "status": 200
+ }
+
+The exact names from the "LocalShards" lists are needed for further
+exploration, as they will be used as part of the URI to look up detailed info
+on a particular shard. An example output for the
+``member-1-shard-default-operational`` looks like this::
+
+ {
+ "request": {
+ "mbean": "org.opendaylight.controller:Category=Shards,name=member-1-shard-default-operational,type=DistributedOperationalDatastore",
+ "type": "read"
+ },
+ "value": {
+ "ReadWriteTransactionCount": 0,
+ "SnapshotIndex": 4,
+ "InMemoryJournalLogSize": 1,
+ "ReplicatedToAllIndex": 4,
+ "Leader": "member-1-shard-default-operational",
+ "LastIndex": 5,
+ "RaftState": "Leader",
+ "LastCommittedTransactionTime": "2017-01-06 13:19:00.135",
+ "LastApplied": 5,
+ "LastLeadershipChangeTime": "2017-01-06 13:18:37.605",
+ "LastLogIndex": 5,
+ "PeerAddresses": "member-3-shard-default-operational: akka.tcp://opendaylight-cluster-data@192.168.16.3:2550/user/shardmanager-operational/member-3-shard-default-operational, member-2-shard-default-operational: akka.tcp://opendaylight-cluster-data@192.168.16.2:2550/user/shardmanager-operational/member-2-shard-default-operational",
+ "WriteOnlyTransactionCount": 0,
+ "FollowerInitialSyncStatus": false,
+ "FollowerInfo": [
+ {
+ "timeSinceLastActivity": "00:00:00.320",
+ "active": true,
+ "matchIndex": 5,
+ "voting": true,
+ "id": "member-3-shard-default-operational",
+ "nextIndex": 6
+ },
+ {
+ "timeSinceLastActivity": "00:00:00.320",
+ "active": true,
+ "matchIndex": 5,
+ "voting": true,
+ "id": "member-2-shard-default-operational",
+ "nextIndex": 6
+ }
+ ],
+ "FailedReadTransactionsCount": 0,
+ "StatRetrievalTime": "810.5 μs",
+ "Voting": true,
+ "CurrentTerm": 1,
+ "LastTerm": 1,
+ "FailedTransactionsCount": 0,
+ "PendingTxCommitQueueSize": 0,
+ "VotedFor": "member-1-shard-default-operational",
+ "SnapshotCaptureInitiated": false,
+ "CommittedTransactionsCount": 6,
+ "TxCohortCacheSize": 0,
+ "PeerVotingStates": "member-3-shard-default-operational: true, member-2-shard-default-operational: true",
+ "LastLogTerm": 1,
+ "StatRetrievalError": null,
+ "CommitIndex": 5,
+ "SnapshotTerm": 1,
+ "AbortTransactionsCount": 0,
+ "ReadOnlyTransactionCount": 0,
+ "ShardName": "member-1-shard-default-operational",
+ "LeadershipChangeCount": 1,
+ "InMemoryJournalDataSize": 450
+ },
+ "timestamp": 1483740350,
+ "status": 200
+ }
+
+The output helps identifying shard state (leader/follower, voting/non-voting),
+peers, follower details if the shard is a leader, and other
+statistics/counters.
+
+The Integration team is maintaining a Python based `tool
+<https://github.com/opendaylight/integration-test/tree/master/tools/clustering/cluster-monitor>`_,
+that takes advantage of the above MBeans exposed via Jolokia, and the
+*systemmetrics* project offers a DLUX based UI to display the same
+information.
+
+Geo-distributed Active/Backup Setup
+-----------------------------------
+
+An OpenDaylight cluster works best when the latency between the nodes is very
+small, which practically means they should be in the same datacenter. It is
+however desirable to have the possibility to fail over to a different
+datacenter, in case all nodes become unreachable. To achieve that, the cluster
+can be expanded with nodes in a different datacenter, but in a way that
+doesn't affect latency of the primary nodes. To do that, shards in the backup
+nodes must be in "non-voting" state.
+
+The API to manipulate voting states on shards is defined as RPCs in the
+`cluster-admin.yang <https://git.opendaylight.org/gerrit/gitweb?p=controller.git;a=blob;f=opendaylight/md-sal/sal-cluster-admin-api/src/main/yang/cluster-admin.yang>`_
+file in the *controller* project, which is well documented. A summary is
+provided below.
+
+.. note::
+
+ Unless otherwise indicated, the below POST requests are to be sent to any
+ single cluster node.
+
+To create an active/backup setup with a 6 node cluster (3 active and 3 backup
+nodes in two locations) there is an RPC to set voting states of all shards on
+a list of nodes to a given state::
+
+ POST /restconf/operations/cluster-admin:change-member-voting-states-for-all-shards
+
+This RPC needs the list of nodes and the desired voting state as input. For
+creating the backup nodes, this example input can be used::
+
+ {
+ "input": {
+ "member-voting-state": [
+ {
+ "member-name": "member-4",
+ "voting": false
+ },
+ {
+ "member-name": "member-5",
+ "voting": false
+ },
+ {
+ "member-name": "member-6",
+ "voting": false
+ }
+ ]
+ }
+ }
+
+When an active/backup deployment already exists, with shards on the backup
+nodes in non-voting state, all that is needed for a fail-over from the active
+"sub-cluster" to backup "sub-cluster" is to flip the voting state of each
+shard (on each node, active AND backup). That can be easily achieved with the
+following RPC call (no parameters needed)::
+
+ POST /restconf/operations/cluster-admin:flip-member-voting-states-for-all-shards
+
+If it's an unplanned outage where the primary voting nodes are down, the
+"flip" RPC must be sent to a backup non-voting node. In this case there are no
+shard leaders to carry out the voting changes. However there is a special case
+whereby if the node that receives the RPC is non-voting and is to be changed
+to voting and there's no leader, it will apply the voting changes locally and
+attempt to become the leader. If successful, it persists the voting changes
+and replicates them to the remaining nodes.
+
+When the primary site is fixed and you want to fail back to it, care must be
+taken when bringing the site back up. Because it was down when the voting
+states were flipped on the secondary, its persisted database won't contain
+those changes. If brought back up in that state, the nodes will think they're
+still voting. If the nodes have connectivity to the secondary site, they
+should follow the leader in the secondary site and sync with it. However if
+this does not happen then the primary site may elect its own leader thereby
+partitioning the 2 clusters, which can lead to undesirable results. Therefore
+it is recommended to either clean the databases (i.e., ``journal`` and
+``snapshots`` directory) on the primary nodes before bringing them back up or
+restore them from a recent backup of the secondary site (see section
+:ref:`cluster_backup_restore` below).
+
+If is also possible to gracefully remove a node from a cluster, with the
+following RPC::
+
+ POST /restconf/operations/cluster-admin:remove-all-shard-replicas
+
+and example input::
+
+ {
+ "input": {
+ "member-name": "member-1"
+ }
+ }
+
+or just one particular shard::
+
+ POST /restconf/operations/cluster-admin:remove-shard-replica
+
+with example input::
+
+ {
+ "input": {
+ "shard-name": "default",
+ "member-name": "member-2",
+ "data-store-type": "config"
+ }
+ }
+
+Now that a (potentially dead/unrecoverable) node was removed, another one can
+be added at runtime, without changing the configuration files of the healthy
+nodes (requiring reboot)::
+
+ POST /restconf/operations/cluster-admin:add-replicas-for-all-shards
+
+No input required, but this RPC needs to be sent to the new node, to instruct
+it to replicate all shards from the cluster.
+
+.. note::
+
+ While the cluster admin API allows adding and removing shards dynamically,
+ the ``module-shard.conf`` and ``modules.conf`` files are still used on
+ startup to define the initial configuration of shards. Modifications from
+ the use of the API are not stored to those static files, but to the journal.
+
+.. _cluster_backup_restore:
+
+Backing Up and Restoring the Datastore
+--------------------------------------
+
+The same cluster-admin API that is used above for managing shard voting states
+has an RPC allowing backup of the datastore in a single node, taking only the
+file name as a parameter::
+
+ POST /restconf/operations/cluster-admin:backup-datastore
+
+RPC input JSON::
+
+ {
+ "input": {
+ "file-path": "/tmp/datastore_backup"
+ }
+ }
+
+.. note::
+
+ This backup can only be restored if the YANG models of the backed-up data
+ are identical in the backup OpenDaylight instance and restore target
+ instance.
+
+To restore the backup on the target node the file needs to be placed into the
+``$KARAF_HOME/clustered-datastore-restore`` directory, and then the node
+restarted. If the directory does not exist (which is quite likely if this is a
+first-time restore) it needs to be created. On startup, ODL checks if the
+``journal`` and ``snapshots`` directories in ``$KARAF_HOME`` are empty, and
+only then tries to read the contents of the ``clustered-datastore-restore``
+directory, if it exists. So for a successful restore, those two directories
+should be empty. The backup file name itself does not matter, and the startup
+process will delete it after a successful restore.
+
+The backup is node independent, so when restoring a 3 node cluster, it is best
+to restore it on each node for consistency. For example, if restoring on one
+node only, it can happen that the other two empty nodes form a majority and
+the cluster comes up with no data.