Adding compute node scale-in scale-out spec

author manojna v <manojna.vijayakrishna@ericsson.com>

Thu, 20 Jun 2019 10:21:49 +0000 (15:51 +0530)

committer manojna v <manojna.vijayakrishna@ericsson.com>

Tue, 9 Jul 2019 09:59:04 +0000 (15:29 +0530)
author manojna v <manojna.vijayakrishna@ericsson.com>
Thu, 20 Jun 2019 10:21:49 +0000 (15:51 +0530)
committer manojna v <manojna.vijayakrishna@ericsson.com>
Tue, 9 Jul 2019 09:59:04 +0000 (15:29 +0530)
diff --git a/docs/specs/compute-node-scalein-and-scaleout.rst b/docs/specs/compute-node-scalein-and-scaleout.rst

new file mode 100644 (file)

index 0000000..8919693
--- /dev/null
+++ b/docs/specs/compute-node-scalein-and-scaleout.rst
@@ -0,0 +1,251 @@
+.. contents:: Table of Contents
+   :depth: 3
+
+===============================================
+Support for compute node scale in and scale out
+===============================================
+
+https://git.opendaylight.org/gerrit/#/q/topic:compute-scalein-scaleout
+
+Add support for adding a new compute node into the existing topology
+and for removing or decomissioning existing compute node from topology.
+
+Problem description
+===================
+Support for adding a new compute node is already available.
+But when we scale in a compute node, we have to cleanup its relevant flows
+from openflow tables and cleanup the vxlan tunnel endpoints from other compute nodes.
+Also if the scaled in compute node is the designated compute for a particular service
+like nat or subnetroute etc , then those services have to choose a new compute node.
+
+Use Cases
+---------
+* Scale out of compute nodes.
+* Scale in of single compute node
+* Scale in of a bunch of compute nodes
+
+
+Proposed change
+===============
+
+The following are steps taken by administrator to achieve compute node scale in.
+
+* The Nova Compute(s) shall be set into maintenance mode (nova service-disable <hostname> nova-compute).
+
+This to avoid VM's to be scheduled to these Compute Hosts.
+
+* Call a new rpc scalein-computes-start <list of scaledin compute node ids> to mark them as tombstoned.
+
+* VMs still residing on the Compute Host(s), shall be migrated from the Compute Host(s).
+
+* Disconnect the compute node from opendaylight controller node.
+
+* Call a new rpc scalein-computes-tep-delete <list of scaledin compute node ids> to delete their teps from the controller.
+
+* Call a new rpc scalein-computes-end <list of scaledin compute node ids> multiple times, till you get the output - DONE.
+
+This is to signal the end of scale-in process.
+
+Incase vm migration or deletion from some of these compute nodes fails
+
+The following recovery rpc will be invoked
+
+scalein-compute-recover <list of not scaled in compute node names which were passed as arg in scalein-computes-start>
+
+Following is the typical sequence of operations.
+
+scalein-computes-start A,B,C
+delete/migrate vms of A ( success )
+delete/migrate vms of B ( fail )
+delete/migrate vms of C ( wont be triggered )
+scalein-computes-tep-delete A
+scalein-computes-end A
+scalein-computes-recover B,C
+
+Typically When a single compute node gets scaled in as it gets disconnected from controller
+all the services who designated this compute as their designated compute would re-elect another
+compute node.
+
+But when multiple compute nodes are getting scaled in during that window some of these computes
+should not be elected as designated compute.
+
+To achieve that these scaled in computes are marked as tombstoned and they should be avoided when
+doing designated switch election or programming new services.
+
+After calling the scalein-computes-start rpc and migrating the vms, orchestrator calls the
+scalein-computes-tep-delete rpc for deleting the tep ips of the computes. Once this is done,
+orchestrator should call scalein-computes-end rpc call multiple times till its output changes
+from INPROGRESS to DONE. This would indicate that the teps have been deleted successfully.
+
+When we receive scalein-computes-end rpc call then corresponding computes config inventory and topology
+database also can be deleted.
+
+When we receive scalein-computes-recover rpc call then corresponding computes tombstoned flag is set to false.
+If there are any services that do not have any compute node designated then they should start election
+of computes and possibly choose from these recovered computes.
+
+
+Pipeline changes
+----------------
+
+None.
+
+Yang changes
+------------
+
+The following rpcs will be added.
+
+.. code-block:: none
+   :caption: scalein-api.yang
+
+        rpc scalein-computes-start {
+            description "To trigger start of scale in the given dpns";
+            input {
+                leaf-list scalein-compute-names {
+                    type string;
+                }
+            }
+        }
+
+        rpc scalein-computes-end {
+            description "To end the scale in of the given dpns output DONE/INPROGRESS";
+            input {
+                leaf-list scalein-compute-names {
+                    type string;
+                }
+            }
+            output {
+                leaf status {
+                    type string;
+                }
+            }
+        }
+
+        rpc scalein-computes-recover {
+            description "To recover the dpns which are marked for scale in";
+            input {
+                leaf-list recover-compute-names {
+                    type string;
+                }
+            }
+        }
+
+        rpc scalein-computes-tep-delete {
+            description "To delete the tep endpoints of the scaled in dpns";
+            input {
+                leaf-list scalein-compute-names {
+                    type string;
+                }
+            }
+        }
+
+
+Topology node bridge-external-ids will be updated with additional key called "tombstoned".
+
+
+Configuration impact
+---------------------
+None.
+
+Clustering considerations
+-------------------------
+None.
+
+Other Infra considerations
+--------------------------
+None.
+
+Security considerations
+-----------------------
+None.
+
+Scale and Performance Impact
+----------------------------
+None
+
+Targeted Release
+-----------------
+Oxygen
+
+Alternatives
+------------
+None.
+
+Usage
+=====
+N/A.
+
+Features to Install
+-------------------
+odl-netvirt-openstack
+
+REST API
+--------
+N/A.
+
+CLI
+---
+N/A.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+Primary assignee:
+
+* suneelu varma (k.v.suneelu.verma@ericsson.com)
+
+Other contributors:
+
+* Hanmanth (hanamantagoud.v.kandagal@ericsson.com)
+* Chetan (chetan.arakere@altencalsoftlabs.com)
+
+Work Items
+----------
+TODO
+
+Dependencies
+============
+No new dependencies.
+
+Testing
+=======
+* Verify that scaled out compute vms should be able to communicate with inter and intra compute vms.
+* Verify that scale in compute flows be removed and existing service continue work.
+* Verify that scale in compute nodes config inventory and topology datastores are cleaned.
+* Identify a compute node which is designated for NAT/subnetroute functionality , scale in that compute,
+  verify that NAT/subnetroute functionality continues to work. Verify that its relevant flows are reprogrammed.
+* While the scale in work flow is going on for few computes, create a new NAT/subnetroute resource,
+  make sure that one of these compute nodes are not chosen.
+* Verify the recovery procedure of scale in workflow, make sure that the recovered compute gets
+  its relevant flows.
+* Scale in a compute which is designated and no other compute has presence of that service (vpn)
+  to be designated, make sure that all its flows and datastores are deleted.
+* Start scale in for a compute which is designated and no other compute has presence of that service (vpn)
+  to be designated, recover the compute and make sure that all its flows and datastores are recovered.
+
+Unit Tests
+----------
+N/A.
+
+Integration Tests
+-----------------
+N/A.
+
+CSIT
+----
+* Verify that scale out compute vms should be able to communicate with inter and intra compute vms.
+* Verify that scale in compute flows be removed and existing service continue work.
+* Identify a compute node which is designated for NAT/subnetroute functionality , scale in that compute,
+  verify that NAT/subnetroute functionality continues to work. Verify that its relevant flows are reprogrammed.
+* Verify the recovery procedure of scale in workflow, make sure that the recovered compute gets
+  its relevant flows.
+
+Documentation Impact
+====================
+N/A
+
+References
+==========
+N/A
diff --git a/docs/specs/index.rst b/docs/specs/index.rst

index 484b0b1d06b05de2818138d796abdc22e3238b5e..826f951b05d8848ad5dfdd5b7575e2c904e0052b 100644 (file)
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -21,3 +21,4 @@ Contents:
     service-recovery
     arputil-dpn-id-in-notifications
     itm-yang-cleanup
+   Support for compute node scale in and scale out functionality <compute-node-scalein-and-scaleout>
author	manojna v <manojna.vijayakrishna@ericsson.com>
	Thu, 20 Jun 2019 10:21:49 +0000 (15:51 +0530)
committer	manojna v <manojna.vijayakrishna@ericsson.com>
	Tue, 9 Jul 2019 09:59:04 +0000 (15:29 +0530)
docs/specs/compute-node-scalein-and-scaleout.rst	[new file with mode: 0644]	patch \| blob
docs/specs/index.rst		patch \| blob \| history