Chore: Update downloads page for Potassium SR3

[docs.git] / docs / getting-started-guide / clustering.rst
diff --git a/docs/getting-started-guide/clustering.rst b/docs/getting-started-guide/clustering.rst

index 56ef8ba66510e7cd3b18f437ee99fa3ddd8e5890..4489de553314d0a878fe683e5fb05e14d0a832e3 100644 (file)
--- a/docs/getting-started-guide/clustering.rst
+++ b/docs/getting-started-guide/clustering.rst
@@ -94,16 +94,16 @@ OpenDaylight includes some scripts to help with the clustering configuration.
  
  .. note::
  
  
  .. note::
  
-    Scripts are stored in the OpenDaylight distribution/bin folder, and
+    Scripts are stored in the OpenDaylight ``distribution/bin`` folder, and
      maintained in the distribution project
      maintained in the distribution project
-    `repository <https://git.opendaylight.org/gerrit/p/integration/distribution>`_
-    in the folder distribution-karaf/src/main/assembly/bin/.
+    `repository <https://git.opendaylight.org/gerrit/admin/repos/integration/distribution>`_
+    in the folder ``karaf-scripts/src/main/assembly/bin/``.
  
  Configure Cluster Script
  ^^^^^^^^^^^^^^^^^^^^^^^^
  
  
  Configure Cluster Script
  ^^^^^^^^^^^^^^^^^^^^^^^^
  
-This script is used to configure the cluster parameters (e.g. akka.conf,
-module-shards.conf) on a member of the controller cluster. The user should
+This script is used to configure the cluster parameters (e.g. ``akka.conf``,
+``module-shards.conf``) on a member of the controller cluster. The user should
  restart the node to apply the changes.
  
  .. note::
  restart the node to apply the changes.
  
  .. note::
@@ -124,7 +124,7 @@ the script. When running this script on multiple seed nodes, keep the
  seed_node_list the same, and vary the index from 1 through N.
  
  Optionally, shards can be configured in a more granular way by modifying the
  seed_node_list the same, and vary the index from 1 through N.
  
  Optionally, shards can be configured in a more granular way by modifying the
-file "custom_shard_configs.txt" in the same folder as this tool. Please see
+file ``"custom_shard_configs.txt"`` in the same folder as this tool. Please see
  that file for more details.
  
  Example::
  that file for more details.
  
  Example::
@@ -144,10 +144,21 @@ do the following on each machine:
  
  #. Copy the OpenDaylight distribution zip file to the machine.
  #. Unzip the distribution.
  
  #. Copy the OpenDaylight distribution zip file to the machine.
  #. Unzip the distribution.
-#. Open the following .conf files:
+#. Move into the ``<karaf-distribution-directory>/bin`` directory and run::
  
  
-   * configuration/initial/akka.conf
-   * configuration/initial/module-shards.conf
+      JAVA_MAX_MEM=4G JAVA_MAX_PERM_MEM=512m ./karaf
+
+#. Enable clustering by running the following command at the Karaf command line::
+
+      feature:install odl-mdsal-distributed-datastore
+
+   After installation you will be able to see new folder ``configuration/initial/``
+   with config files
+
+#. Open the following configuration files:
+
+   * ``configuration/initial/akka.conf``
+   * ``configuration/initial/module-shards.conf``
  
  #. In each configuration file, make the following changes:
  
  
  #. In each configuration file, make the following changes:
  
@@ -155,8 +166,8 @@ do the following on each machine:
     hostname or IP address of the machine on which this file resides and
     OpenDaylight will run::
  
     hostname or IP address of the machine on which this file resides and
     OpenDaylight will run::
  
-      netty.tcp {
-        hostname = "127.0.0.1"
+      artery {
+        canonical.hostname = "127.0.0.1"
  
     .. note:: The value you need to specify will be different for each node in the
               cluster.
  
     .. note:: The value you need to specify will be different for each node in the
               cluster.
@@ -165,7 +176,7 @@ do the following on each machine:
     address of any of the machines that will be part of the cluster::
  
        cluster {
     address of any of the machines that will be part of the cluster::
  
        cluster {
-        seed-nodes = ["akka.tcp://opendaylight-cluster-data@${IP_OF_MEMBER1}:2550",
+        seed-nodes = ["akka://opendaylight-cluster-data@${IP_OF_MEMBER1}:2550",
                        <url-to-cluster-member-2>,
                        <url-to-cluster-member-3>]
  
                        <url-to-cluster-member-2>,
                        <url-to-cluster-member-3>]
  
@@ -179,7 +190,7 @@ do the following on each machine:
  
     .. note:: This step should use a different role on each node.
  
  
     .. note:: This step should use a different role on each node.
  
-#. Open the configuration/initial/module-shards.conf file and update the
+#. Open the ``configuration/initial/module-shards.conf`` file and update the
     replicas so that each shard is replicated to all three nodes::
  
        replicas = [
     replicas so that each shard is replicated to all three nodes::
  
        replicas = [
@@ -188,16 +199,11 @@ do the following on each machine:
            "member-3"
        ]
  
            "member-3"
        ]
  
-   For reference, view a sample config files <<_sample_config_files,below>>.
-
-#. Move into the +<karaf-distribution-directory>/bin+ directory.
-#. Run the following command::
-
-      JAVA_MAX_MEM=4G JAVA_MAX_PERM_MEM=512m ./karaf
+   For reference, view a sample config files below.
  
  
-#. Enable clustering by running the following command at the Karaf command line::
+#. Restart bundle via command line::
  
  
-      feature:install odl-mdsal-clustering
+      opendaylight-user@root>restart org.opendaylight.controller.sal-distributed-datastore
  
  OpenDaylight should now be running in a three node cluster. You can use any of
  the three member nodes to access the data residing in the datastore.
  
  OpenDaylight should now be running in a three node cluster. You can use any of
  the three member nodes to access the data residing in the datastore.
@@ -208,87 +214,46 @@ Sample Config Files
  Sample ``akka.conf`` file::
  
     odl-cluster-data {
  Sample ``akka.conf`` file::
  
     odl-cluster-data {
-     bounded-mailbox {
-       mailbox-type = "org.opendaylight.controller.cluster.common.actor.MeteredBoundedMailbox"
-       mailbox-capacity = 1000
-       mailbox-push-timeout-time = 100ms
-     }
-
-     metric-capture-enabled = true
-
       akka {
       akka {
-       loglevel = "DEBUG"
-       loggers = ["akka.event.slf4j.Slf4jLogger"]
-
-       actor {
-
-         provider = "akka.cluster.ClusterActorRefProvider"
-         serializers {
-                   java = "akka.serialization.JavaSerializer"
-                   proto = "akka.remote.serialization.ProtobufSerializer"
-                 }
-
-                 serialization-bindings {
-                     "com.google.protobuf.Message" = proto
-
-                 }
-       }
         remote {
         remote {
-         log-remote-lifecycle-events = off
-         netty.tcp {
-           hostname = "10.194.189.96"
-           port = 2550
-           maximum-frame-size = 419430400
-           send-buffer-size = 52428800
-           receive-buffer-size = 52428800
+         artery {
+           enabled = on
+           transport = tcp
+           canonical.hostname = "10.0.2.10"
+           canonical.port = 2550
           }
         }
  
         cluster {
           }
         }
  
         cluster {
-         seed-nodes = ["akka.tcp://opendaylight-cluster-data@10.194.189.96:2550",
-                       "akka.tcp://opendaylight-cluster-data@10.194.189.98:2550",
-                       "akka.tcp://opendaylight-cluster-data@10.194.189.101:2550"]
-
-         auto-down-unreachable-after = 10s
+         # Using artery.
+         seed-nodes = ["akka://opendaylight-cluster-data@10.0.2.10:2550",
+                       "akka://opendaylight-cluster-data@10.0.2.11:2550",
+                       "akka://opendaylight-cluster-data@10.0.2.12:2550"]
  
           roles = [
  
           roles = [
-           "member-2"
+           "member-1"
           ]
  
           ]
  
+         # when under load we might trip a false positive on the failure detector
+         # failure-detector {
+           # heartbeat-interval = 4 s
+           # acceptable-heartbeat-pause = 16s
+         # }
         }
         }
-     }
-   }
-
-   odl-cluster-rpc {
-     bounded-mailbox {
-       mailbox-type = "org.opendaylight.controller.cluster.common.actor.MeteredBoundedMailbox"
-       mailbox-capacity = 1000
-       mailbox-push-timeout-time = 100ms
-     }
-
-     metric-capture-enabled = true
-
-     akka {
-       loglevel = "INFO"
-       loggers = ["akka.event.slf4j.Slf4jLogger"]
  
  
-       actor {
-         provider = "akka.cluster.ClusterActorRefProvider"
+       persistence {
+         # By default the snapshots/journal directories live in KARAF_HOME. You can choose to put it somewhere else by
+         # modifying the following two properties. The directory location specified may be a relative or absolute path.
+         # The relative path is always relative to KARAF_HOME.
  
  
-       }
-       remote {
-         log-remote-lifecycle-events = off
-         netty.tcp {
-           hostname = "10.194.189.96"
-           port = 2551
-         }
-       }
+         # snapshot-store.local.dir = "target/snapshots"
  
  
-       cluster {
-         seed-nodes = ["akka.tcp://opendaylight-cluster-rpc@10.194.189.96:2551"]
-
-         auto-down-unreachable-after = 10s
+         # Use lz4 compression for LocalSnapshotStore snapshots
+         snapshot-store.local.use-lz4-compression = false
+         # Size of blocks for lz4 compression: 64KB, 256KB, 1MB or 4MB
+         snapshot-store.local.lz4-blocksize = 256KB
         }
         }
+       disable-default-actor-system-quarantined-event-handling = "false"
       }
     }
  
       }
     }
  
@@ -352,13 +317,13 @@ Sample ``module-shards.conf`` file::
  Cluster Monitoring
  ------------------
  
  Cluster Monitoring
  ------------------
  
-OpenDaylight exposes shard information via MBeans, which can be explored with
-JConsole, VisualVM, or other JMX clients, or exposed via a REST API using
-`Jolokia <https://jolokia.org/features-nb.html>`_, provided by the
+OpenDaylight exposes shard information via ``MBeans``, which can be explored
+with ``JConsole``, VisualVM, or other JMX clients, or exposed via a REST API using
+`Jolokia <https://jolokia.org/features.html>`_, provided by the
  ``odl-jolokia`` Karaf feature. This is convenient, due to a significant focus
  on REST in OpenDaylight.
  
  ``odl-jolokia`` Karaf feature. This is convenient, due to a significant focus
  on REST in OpenDaylight.
  
-The basic URI that lists a schema of all available MBeans, but not their
+The basic URI that lists a schema of all available ``MBeans``, but not their
  content itself is::
  
      GET  /jolokia/list
  content itself is::
  
      GET  /jolokia/list
@@ -416,7 +381,7 @@ on a particular shard. An example output for the
          "LastApplied": 5,
          "LastLeadershipChangeTime": "2017-01-06 13:18:37.605",
          "LastLogIndex": 5,
          "LastApplied": 5,
          "LastLeadershipChangeTime": "2017-01-06 13:18:37.605",
          "LastLogIndex": 5,
-        "PeerAddresses": "member-3-shard-default-operational: akka.tcp://opendaylight-cluster-data@192.168.16.3:2550/user/shardmanager-operational/member-3-shard-default-operational, member-2-shard-default-operational: akka.tcp://opendaylight-cluster-data@192.168.16.2:2550/user/shardmanager-operational/member-2-shard-default-operational",
+        "PeerAddresses": "member-3-shard-default-operational: akka://opendaylight-cluster-data@192.168.16.3:2550/user/shardmanager-operational/member-3-shard-default-operational, member-2-shard-default-operational: akka://opendaylight-cluster-data@192.168.16.2:2550/user/shardmanager-operational/member-2-shard-default-operational",
          "WriteOnlyTransactionCount": 0,
          "FollowerInitialSyncStatus": false,
          "FollowerInfo": [
          "WriteOnlyTransactionCount": 0,
          "FollowerInitialSyncStatus": false,
          "FollowerInfo": [
@@ -467,22 +432,421 @@ The output helps identifying shard state (leader/follower, voting/non-voting),
  peers, follower details if the shard is a leader, and other
  statistics/counters.
  
  peers, follower details if the shard is a leader, and other
  statistics/counters.
  
-The Integration team is maintaining a Python based `tool
-<https://github.com/opendaylight/integration-test/tree/master/tools/clustering/cluster-monitor>`_,
-that takes advantage of the above MBeans exposed via Jolokia.
+The ODLTools team is maintaining a Python based `tool
+<https://github.com/opendaylight/odltools>`_,
+that takes advantage of the above ``MBeans`` exposed via ``Jolokia``.
  
  .. _cluster_admin_api:
  
  
  .. _cluster_admin_api:
  
-Geo-distributed Active/Backup Setup
------------------------------------
+Failure handling
+----------------
+
+Overview
+--------
+
+A fundamental problem in distributed systems is that network
+partitions (split brain scenarios) and machine crashes are indistinguishable
+for the observer, i.e. a node can observe that there is a problem with another
+node, but it cannot tell if it has crashed and will never be available again,
+if there is a network issue that might or might not heal again after a while or
+if process is unresponsive because of overload, CPU starvation or long garbage
+collection pauses.
+
+When there is a crash, we would like to remove the affected node immediately
+from the cluster membership. When there is a network partition or unresponsive
+process we would like to wait for a while in the hope that it is a transient
+problem that will heal again, but at some point, we must give up and continue
+with the nodes on one side of the partition and shut down nodes on the other
+side. Also, certain features are not fully available during partitions so it
+might not matter that the partition is transient or not if it just takes too
+long. Those two goals are in conflict with each other and there is a trade-off
+between how quickly we can remove a crashed node and premature action on
+transient network partitions.
+
+Split Brain Resolver
+--------------------
+
+You need to enable the Split Brain Resolver by configuring it as downing
+provider in the configuration::
+
+    akka.cluster.downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
+
+You should also consider different downing strategies, described below.
+
+.. note:: If no downing provider is specified, NoDowning provider is used.
+
+All strategies are inactive until the cluster membership and the information about
+unreachable nodes have been stable for a certain time period. Continuously adding
+more nodes while there is a network partition does not influence this timeout, since
+the status of those nodes will not be changed to Up while there are unreachable nodes.
+Joining nodes are not counted in the logic of the strategies.
+
+Setting ``akka.cluster.split-brain-resolver.stable-after`` to a shorter duration for having
+quicker removal of crashed nodes can be done at the price of risking a too early action on
+transient network partitions that otherwise would have healed. Do not set this to a shorter
+duration than the membership dissemination time in the cluster, which depends on the cluster size.
+Recommended minimum duration for different cluster sizes:
+
+============   ============
+Cluster size   stable-after
+============   ============
+5              7 s
+10             10 s
+20             13 s
+50             17 s
+100            20 s
+1000           30 s
+============   ============
+
+.. note:: It is important that you use the same configuration on all nodes.
+
+When reachability observations by the failure detector are changed, the SBR
+decisions are deferred until there are no changes within the stable-after
+duration. If this continues for too long it might be an indication of an
+unstable system/network and it could result in delayed or conflicting
+decisions on separate sides of a network partition.
+
+As a precaution for that scenario all nodes are downed if no decision is
+made within stable-after + down-all-when-unstable from the first unreachability
+event. The measurement is reset if all unreachable have been healed, downed or
+removed, or if there are no changes within stable-after * 2.
+
+Configuration::
+
+    akka.cluster.split-brain-resolver {
+      # Time margin after which shards or singletons that belonged to a downed/removed
+      # partition are created in surviving partition. The purpose of this margin is that
+      # in case of a network partition the persistent actors in the non-surviving partitions
+      # must be stopped before corresponding persistent actors are started somewhere else.
+      # This is useful if you implement downing strategies that handle network partitions,
+      # e.g. by keeping the larger side of the partition and shutting down the smaller side.
+      # Decision is taken by the strategy when there has been no membership or
+      # reachability changes for this duration, i.e. the cluster state is stable.
+      stable-after = 20s
+
+      # When reachability observations by the failure detector are changed the SBR decisions
+      # are deferred until there are no changes within the 'stable-after' duration.
+      # If this continues for too long it might be an indication of an unstable system/network
+      # and it could result in delayed or conflicting decisions on separate sides of a network
+      # partition.
+      # As a precaution for that scenario all nodes are downed if no decision is made within
+      # `stable-after + down-all-when-unstable` from the first unreachability event.
+      # The measurement is reset if all unreachable have been healed, downed or removed, or
+      # if there are no changes within `stable-after * 2`.
+      # The value can be on, off, or a duration.
+      # By default it is 'on' and then it is derived to be 3/4 of stable-after, but not less than
+      # 4 seconds.
+      down-all-when-unstable = on
+    }
+
+
+Keep majority
+^^^^^^^^^^^^^
+
+This strategy is used by default, because it works well for most systems.
+It will down the unreachable nodes if the current node is in the majority part
+based on the last known membership information. Otherwise down the reachable
+nodes, i.e. the own part. If the parts are of equal size the part containing the
+node with the lowest address is kept.
+
+This strategy is a good choice when the number of nodes in the cluster change
+dynamically and you can therefore not use static-quorum.
+
+* If there are membership changes at the same time as the network partition
+  occurs, for example, the status of two members are changed to Up on one side
+  but that information is not disseminated to the other side before the
+  connection is broken, it will down all nodes on the side that could be in
+  minority if the joining nodes were changed to Up on the other side.
+  Note that if the joining nodes were not changed to Up and becoming a majority
+  on the other side then each part will shut down itself, terminating the whole
+  cluster.
+
+* If there are more than two partitions and none is in majority each part will
+  shut down itself, terminating the whole cluster.
+
+* If more than half of the nodes crash at the same time the other running nodes
+  will down themselves because they think that they are not in majority, and
+  thereby the whole cluster is terminated.
+
+The decision can be based on nodes with a configured role instead of all nodes
+in the cluster. This can be useful when some types of nodes are more valuable
+than others.
+
+Configuration::
+
+    akka.cluster.split-brain-resolver.active-strategy=keep-majority
+
+::
+
+    akka.cluster.split-brain-resolver.keep-majority {
+      # if the 'role' is defined the decision is based only on members with that 'role'
+      role = ""
+    }
+
+Static quorum
+^^^^^^^^^^^^^
+
+The strategy named static-quorum will down the unreachable nodes if the number
+of remaining nodes are greater than or equal to a configured quorum-size.
+Otherwise, it will down the reachable nodes, i.e. it will shut down that side
+of the partition.
+
+This strategy is a good choice when you have a fixed number of nodes in the
+cluster, or when you can define a fixed number of nodes with a certain role.
+
+* If there are unreachable nodes when starting up the cluster, before reaching
+  this limit, the cluster may shut itself down immediately.
+  This is not an issue if you start all nodes at approximately the same time or
+  use the ``akka.cluster.min-nr-of-members`` to define required number of
+  members before the leader changes member status of ‘Joining’ members to ‘Up’.
+  You can tune the timeout after which downing decisions are made using the
+  stable-after setting.
+
+* You should not add more members to the cluster than quorum-size * 2 - 1.
+  If the exceeded cluster size remains when a SBR decision is needed it will
+  down all nodes because otherwise there is a risk that both sides may down each
+  other and thereby form two separate clusters.
+
+* If the cluster is split into 3 (or more) parts each part that is smaller than
+  then configured quorum-size will down itself and possibly shutdown the whole
+  cluster.
+
+* If more nodes than the configured quorum-size crash at the same time the other
+  running nodes will down themselves because they think that they are not in the
+  majority, and thereby the whole cluster is terminated.
+
+The decision can be based on nodes with a configured role instead of all nodes
+in the cluster. This can be useful when some types of nodes are more valuable
+than others.
+
+By defining a role for a few stable nodes in the cluster and using that in the
+configuration of static-quorum you will be able to dynamically add and remove
+other nodes without this role and still have good decisions of what nodes to
+keep running and what nodes to shut down in the case of network partitions.
+The advantage of this approach compared to keep-majority is that you do not risk
+splitting the cluster into two separate clusters, i.e. a split brain.
+
+Configuration::
+
+    akka.cluster.split-brain-resolver.active-strategy=static-quorum
+
+::
+
+    akka.cluster.split-brain-resolver.static-quorum {
+      # minimum number of nodes that the cluster must have
+      quorum-size = undefined
+
+      # if the 'role' is defined the decision is based only on members with that 'role'
+      role = ""
+    }
+
+Keep oldest
+^^^^^^^^^^^
+
+The strategy named keep-oldest will down the part that does not contain the oldest
+member. The oldest member is interesting because the active Cluster Singleton
+instance is running on the oldest member.
+
+This strategy is good to use if you use Cluster Singleton and do not want to shut
+down the node where the singleton instance runs. If the oldest node crashes a new
+singleton instance will be started on the next oldest node.
+
+* If down-if-alone is configured to on, then if the oldest node has partitioned
+  from all other nodes the oldest will down itself and keep all other nodes running.
+  The strategy will not down the single oldest node when it is the only remaining
+  node in the cluster.
+
+* If there are membership changes at the same time as the network partition occurs,
+  for example, the status of the oldest member is changed to Exiting on one side but
+  that information is not disseminated to the other side before the connection is
+  broken, it will detect this situation and make the safe decision to down all nodes
+  on the side that sees the oldest as Leaving. Note that this has the drawback that
+  if the oldest was Leaving and not changed to Exiting then each part will shut down
+  itself, terminating the whole cluster.
+
+The decision can be based on nodes with a configured role instead of all nodes
+in the cluster.
+
+Configuration::
+
+    akka.cluster.split-brain-resolver.active-strategy=keep-oldest
+
  
  
-An OpenDaylight cluster works best when the latency between the nodes is very
-small, which practically means they should be in the same datacenter. It is
-however desirable to have the possibility to fail over to a different
-datacenter, in case all nodes become unreachable. To achieve that, the cluster
-can be expanded with nodes in a different datacenter, but in a way that
-doesn't affect latency of the primary nodes. To do that, shards in the backup
-nodes must be in "non-voting" state.
+::
+
+    akka.cluster.split-brain-resolver.keep-oldest {
+      # Enable downing of the oldest node when it is partitioned from all other nodes
+      down-if-alone = on
+
+      # if the 'role' is defined the decision is based only on members with that 'role',
+      # i.e. using the oldest member (singleton) within the nodes with that role
+      role = ""
+    }
+
+Down all
+^^^^^^^^
+
+The strategy named down-all will down all nodes.
+
+This strategy can be a safe alternative if the network environment is highly unstable
+with unreachability observations that can’t be fully trusted, and including frequent
+occurrences of indirectly connected nodes. Due to the instability there is an increased
+risk of different information on different sides of partitions and therefore the other
+strategies may result in conflicting decisions. In such environments it can be better
+to shutdown all nodes and start up a new fresh cluster.
+
+* This strategy is not recommended for large clusters (> 10 nodes) because any minor
+  problem will shutdown all nodes, and that is more likely to happen in larger clusters
+  since there are more nodes that may fail.
+
+Configuration::
+
+    akka.cluster.split-brain-resolver.active-strategy=down-all
+
+Lease
+^^^^^
+
+The strategy named lease-majority is using a distributed lease (lock) to decide what
+nodes that are allowed to survive. Only one SBR instance can acquire the lease make
+the decision to remain up. The other side will not be able to acquire the lease and
+will therefore down itself.
+
+This strategy is very safe since coordination is added by an external arbiter.
+
+* In some cases the lease will be unavailable when needed for a decision from all
+  SBR instances, e.g. because it is on another side of a network partition, and then
+  all nodes will be downed.
+
+Configuration::
+
+    akka {
+      cluster {
+        downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
+        split-brain-resolver {
+          active-strategy = "lease-majority"
+          lease-majority {
+            lease-implementation = "akka.coordination.lease.kubernetes"
+          }
+        }
+      }
+    }
+
+::
+
+    akka.cluster.split-brain-resolver.lease-majority {
+      lease-implementation = ""
+
+      # This delay is used on the minority side before trying to acquire the lease,
+      # as an best effort to try to keep the majority side.
+      acquire-lease-delay-for-minority = 2s
+
+      # If the 'role' is defined the majority/minority is based only on members with that 'role'.
+      role = ""
+    }
+
+Indirectly connected nodes
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In a malfunctioning network there can be situations where nodes are observed as
+unreachable via some network links but they are still indirectly connected via
+other nodes, i.e. it’s not a clean network partition (or node crash).
+
+When this situation is detected the Split Brain Resolvers will keep fully
+connected nodes and down all the indirectly connected nodes.
+
+If there is a combination of indirectly connected nodes and a clean network
+partition it will combine the above decision with the ordinary decision,
+e.g. keep majority, after excluding suspicious failure detection observations.
+
+Multi-DC cluster
+----------------
+
+An OpenDaylight cluster has an ability to run on multiple data centers in a way,
+that tolerates network partitions among them.
+
+Nodes can be assigned to group of nodes by setting the
+``akka.cluster.multi-data-center.self-data-center`` configuration property.
+A node can only belong to one data center and if nothing is specified a node will
+belong to the default data center.
+
+The grouping of nodes is not limited to the physical boundaries of data centers,
+it could also be used as a logical grouping for other reasons, such as isolation
+of certain nodes to improve stability or splitting up a large cluster into smaller
+groups of nodes for better scalability.
+
+Failure detection
+^^^^^^^^^^^^^^^^^
+
+Failure detection is performed by sending heartbeat messages to detect if a node
+is unreachable. This is done more frequently and with more certainty among the
+nodes in the same data center than across data centers.
+
+Two different failure detectors can be configured for these two purposes:
+
+* ``akka.cluster.failure-detector`` for failure detection within own data center
+
+* ``akka.cluster.multi-data-center.failure-detector`` for failure detection across
+  different data centers
+
+Heartbeat messages for failure detection across data centers are only performed
+between a number of the oldest nodes on each side. The number of nodes is configured
+with ``akka.cluster.multi-data-center.cross-data-center-connections``.
+
+This influences how rolling updates should be performed. Don’t stop all of the oldest nodes
+that are used for gossip at the same time. Stop one or a few at a time so that new
+nodes can take over the responsibility. It’s best to leave the oldest nodes until last.
+
+Configuration::
+
+    multi-data-center {
+      # Defines which data center this node belongs to. It is typically used to make islands of the
+      # cluster that are colocated. This can be used to make the cluster aware that it is running
+      # across multiple availability zones or regions. It can also be used for other logical
+      # grouping of nodes.
+      self-data-center = "default"
+
+
+      # Try to limit the number of connections between data centers. Used for gossip and heartbeating.
+      # This will not limit connections created for the messaging of the application.
+      # If the cluster does not span multiple data centers, this value has no effect.
+      cross-data-center-connections = 5
+
+      # The n oldest nodes in a data center will choose to gossip to another data center with
+      # this probability. Must be a value between 0.0 and 1.0 where 0.0 means never, 1.0 means always.
+      # When a data center is first started (nodes < 5) a higher probability is used so other data
+      # centers find out about the new nodes more quickly
+      cross-data-center-gossip-probability = 0.2
+
+      failure-detector {
+        # FQCN of the failure detector implementation.
+        # It must implement akka.remote.FailureDetector and have
+        # a public constructor with a com.typesafe.config.Config and
+        # akka.actor.EventStream parameter.
+        implementation-class = "akka.remote.DeadlineFailureDetector"
+
+        # Number of potentially lost/delayed heartbeats that will be
+        # accepted before considering it to be an anomaly.
+        # This margin is important to be able to survive sudden, occasional,
+        # pauses in heartbeat arrivals, due to for example garbage collect or
+        # network drop.
+        acceptable-heartbeat-pause = 10 s
+
+        # How often keep-alive heartbeat messages should be sent to each connection.
+        heartbeat-interval = 3 s
+
+        # After the heartbeat request has been sent the first failure detection
+        # will start after this period, even though no heartbeat message has
+        # been received.
+        expected-response-after = 1 s
+      }
+    }
+
+Active/Backup Setup
+-------------------
+
+It is desirable to have the possibility to fail over to a different
+data center, in case all nodes become unreachable. To achieve that
+shards in the backup data center must be in "non-voting" state.
  
  The API to manipulate voting states on shards is defined as RPCs in the
  `cluster-admin.yang <https://git.opendaylight.org/gerrit/gitweb?p=controller.git;a=blob;f=opendaylight/md-sal/sal-cluster-admin-api/src/main/yang/cluster-admin.yang>`_
  
  The API to manipulate voting states on shards is defined as RPCs in the
  `cluster-admin.yang <https://git.opendaylight.org/gerrit/gitweb?p=controller.git;a=blob;f=opendaylight/md-sal/sal-cluster-admin-api/src/main/yang/cluster-admin.yang>`_
@@ -495,11 +859,29 @@ provided below.
    single cluster node.
  
  To create an active/backup setup with a 6 node cluster (3 active and 3 backup
    single cluster node.
  
  To create an active/backup setup with a 6 node cluster (3 active and 3 backup
-nodes in two locations) there is an RPC to set voting states of all shards on
+nodes in two locations) such configuration is used:
+
+* for member-1, member-2 and member-3 (active data center)::
+
+    akka.cluster.multi-data-center {
+      self-data-center = "main"
+    }
+
+* for member-4, member-5, member-6 (backup data center)::
+
+    akka.cluster.multi-data-center {
+      self-data-center = "backup"
+    }
+
+There is an RPC to set voting states of all shards on
  a list of nodes to a given state::
  
     POST  /restconf/operations/cluster-admin:change-member-voting-states-for-all-shards
  
  a list of nodes to a given state::
  
     POST  /restconf/operations/cluster-admin:change-member-voting-states-for-all-shards
  
+   or
+
+   POST  /rests/operations/cluster-admin:change-member-voting-states-for-all-shards
+
  This RPC needs the list of nodes and the desired voting state as input. For
  creating the backup nodes, this example input can be used::
  
  This RPC needs the list of nodes and the desired voting state as input. For
  creating the backup nodes, this example input can be used::
  
@@ -524,12 +906,16 @@ creating the backup nodes, this example input can be used::
  
  When an active/backup deployment already exists, with shards on the backup
  nodes in non-voting state, all that is needed for a fail-over from the active
  
  When an active/backup deployment already exists, with shards on the backup
  nodes in non-voting state, all that is needed for a fail-over from the active
-"sub-cluster" to backup "sub-cluster" is to flip the voting state of each
+data center to backup data center is to flip the voting state of each
  shard (on each node, active AND backup). That can be easily achieved with the
  following RPC call (no parameters needed)::
  
      POST  /restconf/operations/cluster-admin:flip-member-voting-states-for-all-shards
  
  shard (on each node, active AND backup). That can be easily achieved with the
  following RPC call (no parameters needed)::
  
      POST  /restconf/operations/cluster-admin:flip-member-voting-states-for-all-shards
  
+    or
+
+    POST /rests/operations/cluster-admin:flip-member-voting-states-for-all-shards
+
  If it's an unplanned outage where the primary voting nodes are down, the
  "flip" RPC must be sent to a backup non-voting node. In this case there are no
  shard leaders to carry out the voting changes. However there is a special case
  If it's an unplanned outage where the primary voting nodes are down, the
  "flip" RPC must be sent to a backup non-voting node. In this case there are no
  shard leaders to carry out the voting changes. However there is a special case
@@ -556,6 +942,10 @@ following RPC::
  
      POST  /restconf/operations/cluster-admin:remove-all-shard-replicas
  
  
      POST  /restconf/operations/cluster-admin:remove-all-shard-replicas
  
+    or
+
+    POST  /rests/operations/cluster-admin:remove-all-shard-replicas
+
  and example input::
  
      {
  and example input::
  
      {
@@ -568,6 +958,10 @@ or just one particular shard::
  
      POST  /restconf/operations/cluster-admin:remove-shard-replica
  
  
      POST  /restconf/operations/cluster-admin:remove-shard-replica
  
+    or
+
+    POST  /rests/operations/cluster-admin:remove-shard-replicas
+
  with example input::
  
      {
  with example input::
  
      {
@@ -584,6 +978,10 @@ nodes (requiring reboot)::
  
      POST  /restconf/operations/cluster-admin:add-replicas-for-all-shards
  
  
      POST  /restconf/operations/cluster-admin:add-replicas-for-all-shards
  
+    or
+
+    POST  /rests/operations/cluster-admin:add-replicas-for-all-shards
+
  No input required, but this RPC needs to be sent to the new node, to instruct
  it to replicate all shards from the cluster.
  
  No input required, but this RPC needs to be sent to the new node, to instruct
  it to replicate all shards from the cluster.
  
@@ -606,8 +1004,8 @@ max-shard-data-change-listener-queue-size      uint32 (1..max)   1000    The max
  max-shard-data-store-executor-queue-size       uint32 (1..max)   5000    The maximum queue size for each shard's data store executor.
  shard-transaction-idle-timeout-in-minutes      uint32 (1..max)   10      The maximum amount of time a shard transaction can be idle without receiving any messages before it self-destructs.
  shard-snapshot-batch-count                     uint32 (1..max)   20000   The minimum number of entries to be present in the in-memory journal log before a snapshot is to be taken.
  max-shard-data-store-executor-queue-size       uint32 (1..max)   5000    The maximum queue size for each shard's data store executor.
  shard-transaction-idle-timeout-in-minutes      uint32 (1..max)   10      The maximum amount of time a shard transaction can be idle without receiving any messages before it self-destructs.
  shard-snapshot-batch-count                     uint32 (1..max)   20000   The minimum number of entries to be present in the in-memory journal log before a snapshot is to be taken.
-shard-snapshot-data-threshold-percentage       uint8 (1..100)    12      The percentage of Runtime.totalMemory() used by the in-memory journal log before a snapshot is to be taken
-shard-hearbeat-interval-in-millis              uint16 (100..max) 500     The interval at which a shard will send a heart beat message to its remote shard.
+shard-snapshot-data-threshold-percentage       uint8 (1..100)    12      The percentage of ``Runtime.totalMemory()`` used by the in-memory journal log before a snapshot is to be taken
+shard-heartbeat-interval-in-millis             uint16 (100..max) 500     The interval at which a shard will send a heart beat message to its remote shard.
  operation-timeout-in-seconds                   uint16 (5..max)   5       The maximum amount of time for akka operations (remote or local) to complete before failing.
  shard-journal-recovery-log-batch-size          uint32 (1..max)   5000    The maximum number of journal log entries to batch on recovery for a shard before committing to the data store.
  shard-transaction-commit-timeout-in-seconds    uint32 (1..max)   30      The maximum amount of time a shard transaction three-phase commit can be idle without receiving the next messages before it aborts the transaction
  operation-timeout-in-seconds                   uint16 (5..max)   5       The maximum amount of time for akka operations (remote or local) to complete before failing.
  shard-journal-recovery-log-batch-size          uint32 (1..max)   5000    The maximum number of journal log entries to batch on recovery for a shard before committing to the data store.
  shard-transaction-commit-timeout-in-seconds    uint32 (1..max)   30      The maximum amount of time a shard transaction three-phase commit can be idle without receiving the next messages before it aborts the transaction
@@ -620,4 +1018,4 @@ persistent                                     boolean           true    Enable
  shard-isolated-leader-check-interval-in-millis uint32 (1..max)   5000    the interval at which the leader of the shard will check if its majority followers are active and term itself as isolated
  ============================================== ================= ======= ==============================================================================================================================================================================
  
  shard-isolated-leader-check-interval-in-millis uint32 (1..max)   5000    the interval at which the leader of the shard will check if its majority followers are active and term itself as isolated
  ============================================== ================= ======= ==============================================================================================================================================================================
  
-These configuration options are included in the etc/org.opendaylight.controller.cluster.datastore.cfg configuration file.
+These configuration options are included in the ``etc/org.opendaylight.controller.cluster.datastore.cfg`` configuration file.