+++ /dev/null
-*** Settings ***
-Documentation Suite for controlled installation of ${FEATURE_ONCT}
-...
-... Copyright (c) 2016 Cisco Systems, Inc. and others. All rights reserved.
-...
-... This program and the accompanying materials are made available under the
-... terms of the Eclipse Public License v1.0 which accompanies this distribution,
-... and is available at http://www.eclipse.org/legal/epl-v10.html
-...
-...
-... This suite requires odl-netconf-ssh feature to be already installed,
-... otherwise SSH bundle refresh will cause connection to drop and karaf command "fails".
-...
-... Operation of clustered netconf topology relies on two key services.
-... The netconf topology manager application, which runs on the member
-... which owns "topology-manager" entity (of "netconf-topoogy" type);
-... And config datastore shard for network-topology module,
-... which is controlled by the Leader of the config topology shard.
-... The Leader is providing the desired state (concerning Netconf connectors),
-... the Owner consumes the state, performs necessary actions and updated operational view.
-... In this suite, the common name for the Owner and the Leader is Manager.
-...
-... In a typical cluster High Availability testing scenario,
-... one cluster member is selected, killed (or isolated), and later re-started (re-joined).
-... For Netconf cluster topology testing, there will be scenarios tragetting
-... the Owner, and other scenarios targeting the Leader.
-...
-... But both Owner and Leader selection is overned by the same RAFT algorithm,
-... which relies on message ordering, so there are two typical cases.
-... Either one member becomes both Owner and Leader,
-... or the two Managers are located at random.
-...
-... As the targeted scenarios require the two Managers to reside on different members,
-... neither of the two case is beneficial for testing.
-...
-... There are APIs in place which should allow relocation of Leader,
-... but there are no system tests for them yet.
-... TODO: Study those APIs and create the missing system tests.
-...
-... This suite helps with the Manager placement situation
-... by performing feature installation in runtime, aplying the following strategy:
-...
-... A N-node cluster is started (without ${FEATURE_ONCT} installed),
-... and it is verified one node has become the Leader of topology config shard.
-... As ${FEATURE_ONCT} is installed on the (N-1) follower members
-... (but not on the Leader yet), it is expected one of the members
-... becomes Owner of topology-manager entity.
-... After verifying that, ${FEATURE_ONCT} is installed on the Leader.
-... If neither Owner nor Leader has moved, the desired placement has been created.
-...
-... More specifically, this suite assumes the cluster has been started,
-... it has been stabilized, and ${FEATURE_ONCT} is not installed anywhere.
-... After successful run of this suite, the feature is installed on each member,
-... and the Owner is verified to be placed on different member than the Leader.
-...
-... Note that stress tests may cause Akka delays, which may move the Managers around.
-Suite Setup Setup_Everything
-Suite Teardown Teardown_Everything
-Test Setup SetupUtils.Setup_Test_With_Logging_And_Without_Fast_Failing
-Test Teardown SetupUtils.Teardown_Test_Show_Bugs_If_Test_Failed
-Default Tags clustering netconf critical
-Resource ${CURDIR}/../../../libraries/CarPeople.robot
-Resource ${CURDIR}/../../../libraries/ClusterManagement.robot
-Resource ${CURDIR}/../../../libraries/SetupUtils.robot
-Resource ${CURDIR}/../../../libraries/WaitForFailure.robot
-
-*** Variables ***
-${FEATURE_ONCT} odl-netconf-clustered-topology # the feature name is mentioned multiple times, this is to prevent typos
-${OWNER_ELECTION_TIMEOUT} 180s # very large value to allow for -all- jobs with many feature installations taking up time
-
-*** Test Cases ***
-Locate_Leader
- [Documentation] Set suite variables based on where the Leader is.
- ... As this test may get executed just after cluster restart, WUKS is used to give ODL chance to elect Leaders.
- BuiltIn.Comment FIXME: Migrate Set_Variables_For_Shard to ClusterManagement.robot
- BuiltIn.Wait_Until_Keyword_Succeeds 3m 15s CarPeople.Set_Variables_For_Shard shard_name=topology shard_type=config
-
-Install_Feature_On_Followers
- [Documentation] Perform feature installation on follower members, one by one.
- ... As first connection attempt may fail (coincidence with ssh bundle refresh), WUKS is used.
- # Make sure this works, alternative is to perform the installation in parallel.
- BuiltIn.Wait_Until_Keyword_Succeeds 3x 1s ClusterManagement.Install_Feature_On_List_Or_All feature_name=${FEATURE_ONCT} member_index_list=${topology_follower_indices} timeout=60s
-
-Locate_Owner
- [Documentation] Wait for Owner to appear, store its index to suite variable.
- BuiltIn.Wait_Until_Keyword_Succeeds ${OWNER_ELECTION_TIMEOUT} 3s Single_Locate_Owner_Attempt member_index_list=${topology_follower_indices}
-
-Install_Feature_On_Leader
- [Documentation] Perform feature installation on the Leader member.
- ... This seem to be failing, so use TRACE log.
- ClusterManagement.Install_Feature_On_Member feature_name=${FEATURE_ONCT} member_index=${topology_leader_index} timeout=60s
-
-Verify_Managers_Are_Stationary
- [Documentation] Keep checking that Managers do not move for a while.
- WaitForFailure.Verify_Keyword_Does_Not_Fail_Within_Timeout ${OWNER_ELECTION_TIMEOUT} 1s Check_Manager_Positions
-
-*** Keywords ***
-Setup_Everything
- [Documentation] Initialize libraries and set suite variables.
- SetupUtils.Setup_Utils_For_Setup_And_Teardown
- ClusterManagement.ClusterManagement_Setup
-
-Teardown_Everything
- [Documentation] Teardown the test infrastructure, perform cleanup and release all resources.
- RequestsLibrary.Delete_All_Sessions
-
-Single_Locate_Owner_Attempt
- [Arguments] ${member_index_list}=${EMPTY}
- [Documentation] Performs actions on given (or all) members, one by one:
- ... For the first member listed: Get the actual owner, check candidates, store owner to suite variable.
- ... (If the list has less then one item, this Keyword will fail.)
- ... For other nodes: Get actual owner, check candidates, compare to the first listed member results.
- ... TODO: Move to an appropriate Resource.
- BuiltIn.Comment FIXME: Work with sorted candidte list instead of candidate list length.
- ${index_list} = ClusterManagement.List_Indices_Or_All ${member_index_list}
- ${require_candidate_list} = BuiltIn.Create_List @{index_list}
- ${first_index_listed} = Collections.Remove_From_List ${index_list} ${0}
- # Now ${index_list} contains only the rest of indices.
- ${netconf_manager_owner_index} ${candidates} = ClusterManagement.Get_Owner_And_Candidates_For_Type_And_Id type=topology-netconf id=/general-entity:entity[general-entity:name='topology-manager'] member_index=${first_index_listed} require_candidate_list=${require_candidate_list}
- BuiltIn.Set_Suite_Variable \${netconf_manager_owner_index}
- : FOR ${index} IN @{index_list}
- \ ${new_owner} ${new_candidates} = ClusterManagement.Get_Owner_And_Candidates_For_Type_And_Id type=topology-netconf id=/general-entity:entity[general-entity:name='topology-manager'] member_index=${index}
- \ ... require_candidate_list=${require_candidate_list}
- \ BuiltIn.Should_Be_Equal ${new_owner} ${netconf_manager_owner_index} Member-${index} owner ${new_owner} is not ${netconf_manager_owner_index}
-
-Check_Manager_Positions
- [Documentation] For each Manager, locate its current position and check it is the one stored in suite variable.
- ${new_leader} ${followers} = ClusterManagement.Get_Leader_And_Followers_For_Shard shard_name=topology shard_type=config
- BuiltIn.Should_Be_Equal ${topology_leader_index} ${new_leader}
- ${new_owner} ${candidates} = ClusterManagement.Get_Owner_And_Candidates_For_Type_And_Id type=topology-netconf id=/general-entity:entity[general-entity:name='topology-manager'] member_index=${topology_first_follower_index}
- BuiltIn.Should_Be_Equal ${netconf_manager_owner_index} ${new_owner}
*** Settings ***
-Documentation Suite for High Availability testing config topology shard Leader under stress.
+Documentation Suite for High Availability testing config topology shard leader under stress.
...
... Copyright (c) 2016 Cisco Systems, Inc. and others. All rights reserved.
...
...
...
... This is close analogue of topology_owner_ha.robot, see Documentation there.
-... The difference is that here the requests are sent towards Owner,
-... and the Leader node is rebooted.
+... The difference is that here the requests are sent towards entity-ownership shard leader,
+... and the topology shard leader node is rebooted.
...
... No real clustering Bugs are expected to be discovered by this suite,
... except maybe some Restconf ones.
Library OperatingSystem
Library SSHLibrary timeout=10s
Library String # for Get_Regexp_Matches
+Resource ${CURDIR}/../../../libraries/ClusterAdmin.robot
Resource ${CURDIR}/../../../libraries/ClusterManagement.robot
Resource ${CURDIR}/../../../libraries/KarafKeywords.robot
Resource ${CURDIR}/../../../libraries/NetconfKeywords.robot
-Resource ${CURDIR}/../../../libraries/RemeoteBash.robot
+Resource ${CURDIR}/../../../libraries/RemoteBash.robot
Resource ${CURDIR}/../../../libraries/SetupUtils.robot
Resource ${CURDIR}/../../../libraries/SSHKeywords.robot
Resource ${CURDIR}/../../../libraries/TemplatedRequests.robot
*** Test Cases ***
Locate_Managers
- [Documentation] Detect location of Leader and Owner and store related data into suite variables.
+ [Documentation] Detect location of topology(config) and entity-ownership(operational) leaders and store related data into suite variables.
... This cannot be part of Suite Setup, as Utils.Get_Index_From_List_Of_Dictionaries calls BuiltIn.Set_Test_Variable.
... WUKS are used, as location failures are probably due to booting process, not bugs.
${topology_config_leader_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Leader_And_Followers_For_Shard shard_name=topology
BuiltIn.Set_Suite_Variable \${topology_config_leader_ip}
${topology_config_leader_http_session} = Resolve_Http_Session_For_Member ${topology_config_leader_index}
BuiltIn.Set_Suite_Variable \${topology_config_leader_http_session}
- ${netconf_manager_owner_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Owner_And_Candidates_For_Type_And_Id type=topology-netconf
- ... id=/general-entity:entity[general-entity:name='topology-manager'] member_index=1
- BuiltIn.Set_Suite_Variable \${netconf_manager_owner_index}
- ${netconf_manager_owner_ip} = ClusterManagement.Resolve_Ip_Address_For_Member ${netconf_manager_owner_index}
- BuiltIn.Set_Suite_Variable \${netconf_manager_owner_ip}
- ${netconf_manager_owner_http_session} = Resolve_Http_Session_For_Member ${netconf_manager_owner_index}
- BuiltIn.Set_Suite_Variable \${netconf_manager_owner_http_session}
+ ${entity_ownership_leader_index} Change_Entity_Ownership_Leader_If_Needed ${topology_config_leader_index}
+ BuiltIn.Set_Suite_Variable \${entity_ownership_leader_index}
+ ${entity_ownership_leader_ip} = ClusterManagement.Resolve_Ip_Address_For_Member ${entity_ownership_leader_index}
+ BuiltIn.Set_Suite_Variable \${entity_ownership_leader_ip}
+ ${entity_ownership_leader_http_session} = Resolve_Http_Session_For_Member ${entity_ownership_leader_index}
+ BuiltIn.Set_Suite_Variable \${entity_ownership_leader_http_session}
Start_Testtool
[Documentation] Deploy and start test tool on its separate SSH session.
${log_filename} = Utils.Get_Log_File_Name configurer
BuiltIn.Set_Suite_Variable \${log_filename}
# TODO: Should things like restconf port/user/password be set from Variables?
- ${command} = BuiltIn.Set_Variable python configurer.py --odladdress ${netconf_manager_owner_ip} --deviceaddress ${TOOLS_SYSTEM_IP} --devices ${DEVICE_SET_SIZE} --disconndelay ${CONFIGURED_DEVICES_LIMIT} --basename ${DEVICE_BASE_NAME} --connsleep ${CONNECTION_SLEEP} &> "${log_filename}"
+ ${command} = BuiltIn.Set_Variable python configurer.py --odladdress ${entity_ownership_leader_ip} --deviceaddress ${TOOLS_SYSTEM_IP} --devices ${DEVICE_SET_SIZE} --disconndelay ${CONFIGURED_DEVICES_LIMIT} --basename ${DEVICE_BASE_NAME} --connsleep ${CONNECTION_SLEEP} &> "${log_filename}"
SSHLibrary.Write ${command}
${status} ${text} = BuiltIn.Run_Keyword_And_Ignore_Error SSHLibrary.Read_Until_Prompt
BuiltIn.Log ${text}
BuiltIn.Wait_Until_Keyword_Succeeds ${timeout} 1s Check_Config_Items_Lower_Bound
Reboot_Topology_Leader
- [Documentation] Kill and restart member where topology shard Leader was, including removal of persisted data.
+ [Documentation] Kill and restart member where topology shard leader was, including removal of persisted data.
... After cluster sync, sleep additional time to ensure manager processes requests with the rebooted member fully rejoined.
[Tags] @{TAGS_NONCRITICAL} # To avoid long WUKS list expanded in log.html
ClusterManagement.Kill_Single_Member ${topology_config_leader_index}
Stop_Configurer
[Documentation] Write ctrl+c, download the log, read its contents and match expected patterns.
- RemeoteBash.Write_Bare_Ctrl_C
+ RemoteBash.Write_Bare_Ctrl_C
${output} = SSHLibrary.Read_Until_Prompt
BuiltIn.Log ${output}
SSHLibrary.Get_File ${log_filename}
Get_Config_Device_Count
[Documentation] Count number of items in config netconf topology matching ${DEVICE_BASE_NAME}
- ${item_data} = TemplatedRequests.Get_As_Json_From_Uri ${CONFIG_API}/network-topology:network-topology/topology/topology-netconf session=${netconf_manager_owner_http_session}
+ ${item_data} = TemplatedRequests.Get_As_Json_From_Uri ${CONFIG_API}/network-topology:network-topology/topology/topology-netconf session=${entity_ownership_leader_http_session}
BuiltIn.Run_Keyword_And_Return Count_Substring_Occurence substring=${DEVICE_BASE_NAME} main_string=${item_data}
Get_Operational_Device_Count
[Documentation] Count number of items in operational netconf topology matching ${DEVICE_BASE_NAME}
- ${item_data} = TemplatedRequests.Get_As_Json_From_Uri ${OPERATIONAL_API}/network-topology:network-topology/topology/topology-netconf session=${netconf_manager_owner_http_session}
+ ${item_data} = TemplatedRequests.Get_As_Json_From_Uri ${OPERATIONAL_API}/network-topology:network-topology/topology/topology-netconf session=${entity_ownership_leader_http_session}
BuiltIn.Run_Keyword_And_Return Count_Substring_Occurence substring=${DEVICE_BASE_NAME} main_string=${item_data}
Check_Config_Items_Lower_Bound
[Arguments] ${coefficient}=1.0
[Documentation] Return number of seconds typical for given scale variables.
BuiltIn.Run_Keyword_And_Return BuiltIn.Evaluate ${coefficient} * ${CONNECTION_SLEEP} * ${CONFIGURED_DEVICES_LIMIT}
+
+Change_Entity_Ownership_Leader_If_Needed
+ [Arguments] ${topology_config_leader_idx}
+ [Documentation] Move entity-ownership (operational) shard leader if it is on the same node as topology (config) shard leader.
+ ${entity_ownership_leader_index_old} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Leader_And_Followers_For_Shard shard_name=entity-ownership
+ ... shard_type=operational
+ BuiltIn.Return_From_Keyword_If ${topology_config_leader_idx} != ${entity_ownership_leader_index_old} ${entity_ownership_leader_index_old}
+ ${idx}= Collections.Get_From_List ${candidates} 0
+ ClusterAdmin.Make_Leader_Local ${idx} entity-ownership operational
+ ${entity_ownership_leader_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 60s 3s ClusterManagement.Verify_Shard_Leader_Elected entity-ownership
+ ... operational ${True} ${entity_ownership_leader_index_old} verify_restconf=False
+ BuiltIn.Return_From_Keyword ${entity_ownership_leader_index}
...
... This suite uses a Python utility to continuously configure/deconfigure
... device connections against devices simulated by testtool.
-... The utility sends requests to the member which is Leader for topology config shard.
+... The utility sends requests to the member which is leader for topology config shard.
...
... To avoid excessive resource consumption, the utility deconfigures old devices.
... In a stationary state, number of config items oscillates between
... ${CONFIGURED_DEVICES_LIMIT} and 1 + ${CONFIGURED_DEVICES_LIMIT}.
...
... The only tested HA event so far is reboot of the member
-... which is Owner of netconf topology-manager entity.
-... This suite assumes the Owner and the Leader are not co-located.
+... which is the leader of entity-ownership operational shard.
+... This suite assumes the entity-ownership operational shard leader and
+... topology config shard leader are not co-located.
...
... Number of devices is configurable, wait times are computed from that,
... as it takes some time to initialize connections.
-... Ideally, the utility should go through half of devices during Owner downtime.
+... Ideally, the utility should go through half of devices during entity-ownership leader downtime.
...
... If there is a period when netconf manager ignores deletions in config datastore,
... the devices created previously could "leak", meaning the number of
Library OperatingSystem
Library SSHLibrary timeout=10s
Library String # for Get_Regexp_Matches
+Resource ${CURDIR}/../../../libraries/ClusterAdmin.robot
Resource ${CURDIR}/../../../libraries/ClusterManagement.robot
Resource ${CURDIR}/../../../libraries/KarafKeywords.robot
Resource ${CURDIR}/../../../libraries/NetconfKeywords.robot
-Resource ${CURDIR}/../../../libraries/RemeoteBash.robot
+Resource ${CURDIR}/../../../libraries/RemoteBash.robot
Resource ${CURDIR}/../../../libraries/SetupUtils.robot
Resource ${CURDIR}/../../../libraries/SSHKeywords.robot
Resource ${CURDIR}/../../../libraries/TemplatedRequests.robot
@{TAGS_NONCRITICAL} clustering netconf
*** Test Cases ***
-Locate_Managers
- [Documentation] Detect location of Leader and Owner and store related data into suite variables.
+Setup_Leaders_Location
+ [Documentation] Detect location of topology(config) and entity-ownership(operational) leaders and store related data into suite variables.
... This cannot be part of Suite Setup, as Utils.Get_Index_From_List_Of_Dictionaries calls BuiltIn.Set_Test_Variable.
... WUKS are used, as location failures are probably due to booting process, not bugs.
${topology_config_leader_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Leader_And_Followers_For_Shard shard_name=topology
BuiltIn.Set_Suite_Variable \${topology_config_leader_ip}
${topology_config_leader_http_session} = Resolve_Http_Session_For_Member ${topology_config_leader_index}
BuiltIn.Set_Suite_Variable \${topology_config_leader_http_session}
- ${netconf_manager_owner_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Owner_And_Candidates_For_Type_And_Id type=topology-netconf
- ... id=/general-entity:entity[general-entity:name='topology-manager'] member_index=1
- BuiltIn.Set_Suite_Variable \${netconf_manager_owner_index}
- ${netconf_manager_owner_ip} = ClusterManagement.Resolve_Ip_Address_For_Member ${netconf_manager_owner_index}
- BuiltIn.Set_Suite_Variable \${netconf_manager_owner_ip}
- ${netconf_manager_owner_http_session} = Resolve_Http_Session_For_Member ${netconf_manager_owner_index}
- BuiltIn.Set_Suite_Variable \${netconf_manager_owner_http_session}
+ ${entity_ownership_leader_index} Change_Entity_Ownership_Leader_If_Needed ${topology_config_leader_index}
+ BuiltIn.Set_Suite_Variable \${entity_ownership_leader_index}
+ ${entity_ownership_leader_ip} = ClusterManagement.Resolve_Ip_Address_For_Member ${entity_ownership_leader_index}
+ BuiltIn.Set_Suite_Variable \${entity_ownership_leader_ip}
+ ${entity_ownership_leader_http_session} = Resolve_Http_Session_For_Member ${entity_ownership_leader_index}
+ BuiltIn.Set_Suite_Variable \${entity_ownership_leader_http_session}
Start_Testtool
[Documentation] Deploy and start test tool on its separate SSH session.
${timeout} = Get_Typical_Time
BuiltIn.Wait_Until_Keyword_Succeeds ${timeout} 1s Check_Config_Items_Lower_Bound
-Reboot_Manager_Owner
- [Documentation] Kill and restart member where netconf topology manager was, including removal of persisted data.
- ... After cluster sync, sleep additional time to ensure manager processes requests with the rebooted member fully rejoined.
+Reboot_Entity_Ownership_Leader
+ [Documentation] Kill and restart member where entity-ownership shard leader was, including removal of persisted data.
+ ... After cluster sync, sleep additional time to ensure entity-ownership shard processes requests with the rebooted member fully rejoined.
[Tags] @{TAGS_NONCRITICAL} # To avoid long WUKS list expanded in log.html
- ClusterManagement.Kill_Single_Member ${netconf_manager_owner_index}
- ${owner_list} = BuiltIn.Create_List ${netconf_manager_owner_index}
- ClusterManagement.Start_Single_Member ${netconf_manager_owner_index}
+ ClusterManagement.Kill_Single_Member ${entity_ownership_leader_index}
+ ${owner_list} = BuiltIn.Create_List ${entity_ownership_leader_index}
+ ClusterManagement.Start_Single_Member ${entity_ownership_leader_index}
BuiltIn.Comment FIXME: Replace sleep with WUKS when it becomes clear what to wait for.
${sleep_time} = Get_Typical_Time coefficient=3.0
BuiltIn.Sleep ${sleep_time}
Stop_Configurer
[Documentation] Write ctrl+c, download the log, read its contents and match expected patterns.
- RemeoteBash.Write_Bare_Ctrl_C
+ RemoteBash.Write_Bare_Ctrl_C
${output} = SSHLibrary.Read_Until_Prompt
BuiltIn.Log ${output}
SSHLibrary.Get_File ${log_filename}
[Arguments] ${coefficient}=1.0
[Documentation] Return number of seconds typical for given scale variables.
BuiltIn.Run_Keyword_And_Return BuiltIn.Evaluate ${coefficient} * ${CONNECTION_SLEEP} * ${CONFIGURED_DEVICES_LIMIT}
+
+Change_Entity_Ownership_Leader_If_Needed
+ [Arguments] ${topology_config_leader_idx}
+ [Documentation] Move entity-ownership (operational) shard leader if it is on the same node as topology (config) shard leader.
+ ... TODO: move keyword to a common resource, e.g. ShardStability
+ ${entity_ownership_leader_index_old} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Leader_And_Followers_For_Shard shard_name=entity-ownership
+ ... shard_type=operational
+ BuiltIn.Return_From_Keyword_If ${topology_config_leader_idx} != ${entity_ownership_leader_index_old} ${entity_ownership_leader_index_old}
+ ${idx}= Collections.Get_From_List ${candidates} 0
+ ClusterAdmin.Make_Leader_Local ${idx} entity-ownership operational
+ ${entity_ownership_leader_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 60s 3s ClusterManagement.Verify_Shard_Leader_Elected entity-ownership
+ ... operational ${True} ${entity_ownership_leader_index_old} verify_restconf=False
+ BuiltIn.Return_From_Keyword ${entity_ownership_leader_index}
# Place the suites in run order:
-# Install feature in controlled way.
-integration/test/csit/suites/netconf/clusteringscale/staggered_install.robot
-
# Make sure ODL is ready.
integration/test/csit/suites/netconf/ready
# Reset in order to run more suites.
integration/test/csit/suites/test/cluster_reset.robot
-integration/test/csit/suites/netconf/clusteringscale/staggered_install.robot
integration/test/csit/suites/netconf/ready
# More suites.