2 Documentation Suite for High Availability testing netconf topology owner under stress.
4 ... Copyright (c) 2016 Cisco Systems, Inc. and others. All rights reserved.
6 ... This program and the accompanying materials are made available under the
7 ... terms of the Eclipse Public License v1.0 which accompanies this distribution,
8 ... and is available at http://www.eclipse.org/legal/epl-v10.html
11 ... Suite topology_leader_ha.robot is derived from this suite.
12 ... Please, keep the logic in the suites as similar as possible.
14 ... This suite uses a Python utility to continuously configure/deconfigure
15 ... device connections against devices simulated by testtool.
16 ... The utility sends requests to the member which is leader for topology config shard.
18 ... To avoid excessive resource consumption, the utility deconfigures old devices.
19 ... In a stationary state, number of config items oscillates between
20 ... ${CONFIGURED_DEVICES_LIMIT} and 1 + ${CONFIGURED_DEVICES_LIMIT}.
22 ... The only tested HA event so far is reboot of the member
23 ... which is the leader of entity-ownership operational shard.
24 ... This suite assumes the entity-ownership operational shard leader and
25 ... topology config shard leader are not co-located.
27 ... Number of devices is configurable, wait times are computed from that,
28 ... as it takes some time to initialize connections.
29 ... Ideally, the utility should go through half of devices during entity-ownership leader downtime.
31 ... If there is a period when netconf manager ignores deletions in config datastore,
32 ... the devices created previously could "leak", meaning the number of
33 ... netconf topology items could be higher than 1 + ${CONFIGURED_DEVICES_LIMIT}.
35 ... One check for correctness is the final number of devices in operational netconf topology.
36 ... Another check is performed on utility output.
38 ... Performance can be estimated by the total number of requests processed,
39 ... but this suite does not perform such a computation.
41 ... TODO: After stopping utility, wait to see mount has succeeded on the devices.
42 Suite Setup Setup_Everything
43 Suite Teardown Teardown_Everything
44 Test Setup SetupUtils.Setup_Test_With_Logging_And_Without_Fast_Failing
45 Test Teardown ${DEFAULT_TEARDOWN_KEYWORD}
46 Default Tags @{TAGS_CRITICAL}
47 Library OperatingSystem
48 Library SSHLibrary timeout=10s
49 Library String # for Get_Regexp_Matches
50 Resource ${CURDIR}/../../../libraries/ClusterAdmin.robot
51 Resource ${CURDIR}/../../../libraries/ClusterManagement.robot
52 Resource ${CURDIR}/../../../libraries/KarafKeywords.robot
53 Resource ${CURDIR}/../../../libraries/NetconfKeywords.robot
54 Resource ${CURDIR}/../../../libraries/RemoteBash.robot
55 Resource ${CURDIR}/../../../libraries/SetupUtils.robot
56 Resource ${CURDIR}/../../../libraries/SSHKeywords.robot
57 Resource ${CURDIR}/../../../libraries/TemplatedRequests.robot
58 Resource ${CURDIR}/../../../libraries/Utils.robot
59 Variables ${CURDIR}/../../../variables/Variables.py
62 ${CONFIGURED_DEVICES_LIMIT} 20
63 ${CONNECTION_SLEEP} 1.2
64 ${DEFAULT_TEARDOWN_KEYWORD} SetupUtils.Teardown_Test_Show_Bugs_If_Test_Failed
65 ${DEVICE_BASE_NAME} netconf-test-device
67 @{TAGS_CRITICAL} critical @{TAGS_NONCRITICAL}
68 @{TAGS_NONCRITICAL} clustering netconf
71 Setup_Leaders_Location
72 [Documentation] Detect location of topology(config) and entity-ownership(operational) leaders and store related data into suite variables.
73 ... This cannot be part of Suite Setup, as Utils.Get_Index_From_List_Of_Dictionaries calls BuiltIn.Set_Test_Variable.
74 ... WUKS are used, as location failures are probably due to booting process, not bugs.
75 ${topology_config_leader_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Leader_And_Followers_For_Shard shard_name=topology
77 BuiltIn.Set_Suite_Variable \${topology_config_leader_index}
78 ${topology_config_leader_ip} = ClusterManagement.Resolve_Ip_Address_For_Member ${topology_config_leader_index}
79 BuiltIn.Set_Suite_Variable \${topology_config_leader_ip}
80 ${topology_config_leader_http_session} = Resolve_Http_Session_For_Member ${topology_config_leader_index}
81 BuiltIn.Set_Suite_Variable \${topology_config_leader_http_session}
82 ${entity_ownership_leader_index} Change_Entity_Ownership_Leader_If_Needed ${topology_config_leader_index}
83 BuiltIn.Set_Suite_Variable \${entity_ownership_leader_index}
84 ${entity_ownership_leader_ip} = ClusterManagement.Resolve_Ip_Address_For_Member ${entity_ownership_leader_index}
85 BuiltIn.Set_Suite_Variable \${entity_ownership_leader_ip}
86 ${entity_ownership_leader_http_session} = Resolve_Http_Session_For_Member ${entity_ownership_leader_index}
87 BuiltIn.Set_Suite_Variable \${entity_ownership_leader_http_session}
90 [Documentation] Deploy and start test tool on its separate SSH session.
91 SSHLibrary.Switch_Connection ${testtool_connection_index}
92 NetconfKeywords.Install_And_Start_Testtool device-count=${DEVICE_SET_SIZE} schemas=${CURDIR}/../../../variables/netconf/CRUD/schemas
93 # TODO: Introduce NetconfKeywords.Safe_Install_And_Start_Testtool to avoid teardown maniputation.
94 [Teardown] BuiltIn.Run_Keywords SSHLibrary.Switch_Connection ${configurer_connection_index}
95 ... AND ${DEFAULT_TEARDOWN_KEYWORD}
98 [Documentation] Launch Python utility (while copying output to log file) and verify it does not stop by itself.
99 ${log_filename} = Utils.Get_Log_File_Name configurer
100 BuiltIn.Set_Suite_Variable \${log_filename}
101 # TODO: Should things like restconf port/user/password be set from Variables?
102 ${command} = BuiltIn.Set_Variable python configurer.py --odladdress ${topology_config_leader_ip} --deviceaddress ${TOOLS_SYSTEM_IP} --devices ${DEVICE_SET_SIZE} --disconndelay ${CONFIGURED_DEVICES_LIMIT} --basename ${DEVICE_BASE_NAME} --connsleep ${CONNECTION_SLEEP} &> "${log_filename}"
103 SSHLibrary.Write ${command}
104 ${status} ${text} = BuiltIn.Run_Keyword_And_Ignore_Error SSHLibrary.Read_Until_Prompt
106 BuiltIn.Run_Keyword_If "${status}" != "FAIL" BuiltIn.Fail Prompt happened, see Log.
107 # Session is kept active.
109 Wait_For_Config_Items
110 [Documentation] Make sure configurer is in phase when old devices are being deconfigured; or fail on timeout.
111 ${timeout} = Get_Typical_Time
112 BuiltIn.Wait_Until_Keyword_Succeeds ${timeout} 1s Check_Config_Items_Lower_Bound
114 Reboot_Entity_Ownership_Leader
115 [Documentation] Kill and restart member where entity-ownership shard leader was, including removal of persisted data.
116 ... After cluster sync, sleep additional time to ensure entity-ownership shard processes requests with the rebooted member fully rejoined.
117 [Tags] @{TAGS_NONCRITICAL} # To avoid long WUKS list expanded in log.html
118 ClusterManagement.Kill_Single_Member ${entity_ownership_leader_index}
119 ${owner_list} = BuiltIn.Create_List ${entity_ownership_leader_index}
120 ClusterManagement.Start_Single_Member ${entity_ownership_leader_index}
121 BuiltIn.Comment FIXME: Replace sleep with WUKS when it becomes clear what to wait for.
122 ${sleep_time} = Get_Typical_Time coefficient=3.0
123 BuiltIn.Sleep ${sleep_time}
126 [Documentation] Write ctrl+c, download the log, read its contents and match expected patterns.
127 RemoteBash.Write_Bare_Ctrl_C
128 ${output} = SSHLibrary.Read_Until_Prompt
129 BuiltIn.Log ${output}
130 SSHLibrary.Get_File ${log_filename}
131 ${output} = OperatingSystem.Get_File ${log_filename}
132 ${list_any_matches} = String.Get_Regexp_Matches ${output} delete|put
133 ${number_any_matches} = BuiltIn.Get_Length ${list_any_matches}
134 BuiltIn.Should_Be_Equal ${2} ${number_any_matches} Unexpected status seen: ${output}
135 ${list_strict_matches} = String.Get_Regexp_Matches ${output} delete:200|put:201
136 ${number_strict_matches} = BuiltIn.Get_Length ${list_strict_matches}
137 BuiltIn.Should_Be_Equal ${2} ${number_strict_matches} Expected status not seen: ${output}
139 Check_For_Connector_Leak
140 [Documentation] Check that number of items in operational netconf topology is not higher than expected.
141 # FIXME: Are separate keywords necessary?
142 Check_Operational_Items_Upper_Bound
146 [Documentation] Initialize libraries and set suite variables..
147 ClusterManagement.ClusterManagement_Setup
148 SetupUtils.Setup_Utils_For_Setup_And_Teardown
149 NetconfKeywords.Setup_Netconf_Keywords create_session_for_templated_requests=False
150 ${testtool_connection_index} = SSHKeywords.Open_Connection_To_Tools_System
151 BuiltIn.Set_Suite_Variable \${testtool_connection_index}
152 ${configurer_connection_index} = SSHKeywords.Open_Connection_To_Tools_System
153 BuiltIn.Set_Suite_Variable \${configurer_connection_index}
154 SSHKeywords.Require_Python
155 SSHKeywords.Assure_Library_Counter
156 SSHLibrary.Put_File ${CURDIR}/../../../../tools/netconf_tools/configurer.py
157 SSHLibrary.Put_File ${CURDIR}/../../../libraries/AuthStandalone.py
160 [Documentation] Teardown the test infrastructure, perform cleanup and release all resources.
161 SSHLibrary.Switch_Connection ${testtool_connection_index}
162 NetconfKeywords.Stop_Testtool
163 RequestsLibrary.Delete_All_Sessions
165 Count_Substring_Occurence
166 [Arguments] ${substring} ${main_string}
167 [Documentation] Apply the length_of_split method for counting how many times ${substring} occures within ${main_string}.
168 ... The method is reliable only if triple-double quotes are not present in either argument.
169 BuiltIn.Comment TODO: Migrate this keyword into an appropriate Resource.
170 BuiltIn.Run_Keyword_And_Return Builtin.Evaluate len("""${main_string}""".split("""${substring}""")) - 1
172 Get_Config_Device_Count
173 [Documentation] Count number of items in config netconf topology matching ${DEVICE_BASE_NAME}
174 ${item_data} = TemplatedRequests.Get_As_Json_From_Uri ${CONFIG_API}/network-topology:network-topology/topology/topology-netconf session=${topology_config_leader_http_session}
175 BuiltIn.Run_Keyword_And_Return Count_Substring_Occurence substring=${DEVICE_BASE_NAME} main_string=${item_data}
177 Get_Operational_Device_Count
178 [Documentation] Count number of items in operational netconf topology matching ${DEVICE_BASE_NAME}
179 ${item_data} = TemplatedRequests.Get_As_Json_From_Uri ${OPERATIONAL_API}/network-topology:network-topology/topology/topology-netconf session=${topology_config_leader_http_session}
180 BuiltIn.Run_Keyword_And_Return Count_Substring_Occurence substring=${DEVICE_BASE_NAME} main_string=${item_data}
182 Check_Config_Items_Lower_Bound
183 [Documentation] Count items matching ${DEVICE_BASE_NAME}, fail if less than ${CONFIGURED_DEVICES_LIMIT}
184 ${device_count} = Get_Config_Device_Count
185 BuiltIn.Run_Keyword_If ${device_count} < ${CONFIGURED_DEVICES_LIMIT} BuiltIn.Fail Found ${device_count} config items, should be at least ${CONFIGURED_DEVICES_LIMIT}
187 Check_Operational_Items_Upper_Bound
188 [Documentation] Count items matching ${DEVICE_BASE_NAME}, fail if more than 1 + ${CONFIGURED_DEVICES_LIMIT}
189 ${device_count} = Get_Operational_Device_Count
190 BuiltIn.Run_Keyword_If ${device_count} > 1 + ${CONFIGURED_DEVICES_LIMIT} BuiltIn.Fail Found ${device_count} config items, should be at most 1 + ${CONFIGURED_DEVICES_LIMIT}
193 [Arguments] ${coefficient}=1.0
194 [Documentation] Return number of seconds typical for given scale variables.
195 BuiltIn.Run_Keyword_And_Return BuiltIn.Evaluate ${coefficient} * ${CONNECTION_SLEEP} * ${CONFIGURED_DEVICES_LIMIT}
197 Change_Entity_Ownership_Leader_If_Needed
198 [Arguments] ${topology_config_leader_idx}
199 [Documentation] Move entity-ownership (operational) shard leader if it is on the same node as topology (config) shard leader.
200 ... TODO: move keyword to a common resource, e.g. ShardStability
201 ${entity_ownership_leader_index_old} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 3x 2s ClusterManagement.Get_Leader_And_Followers_For_Shard shard_name=entity-ownership
202 ... shard_type=operational
203 BuiltIn.Return_From_Keyword_If ${topology_config_leader_idx} != ${entity_ownership_leader_index_old} ${entity_ownership_leader_index_old}
204 ${idx}= Collections.Get_From_List ${candidates} 0
205 ClusterAdmin.Make_Leader_Local ${idx} entity-ownership operational
206 ${entity_ownership_leader_index} ${candidates} = BuiltIn.Wait_Until_Keyword_Succeeds 60s 3s ClusterManagement.Verify_Shard_Leader_Elected entity-ownership
207 ... operational ${True} ${entity_ownership_leader_index_old} verify_restconf=False
208 BuiltIn.Return_From_Keyword ${entity_ownership_leader_index}