Netconf configuration stress HA suites
[integration/test.git] / csit / suites / netconf / clusteringscale / topology_owner_ha.robot
1 *** Settings ***
2 Documentation     Suite for High Availability testing netconf topology owner under stress.
3 ...
4 ...               Copyright (c) 2016 Cisco Systems, Inc. and others. All rights reserved.
5 ...
6 ...               This program and the accompanying materials are made available under the
7 ...               terms of the Eclipse Public License v1.0 which accompanies this distribution,
8 ...               and is available at http://www.eclipse.org/legal/epl-v10.html
9 ...
10 ...
11 ...               Suite topology_leader_ha.robot is derived from this suite.
12 ...               Please, keep the logic in the suites as similar as possible.
13 ...
14 ...               This suite uses a Python utility to continuously configure/deconfigure
15 ...               device connections against devices simulated by testtool.
16 ...               The utility sends requests to the member which is Leader for topology config shard.
17 ...
18 ...               To avoid excessive resource consumption, the utility deconfigures old devices.
19 ...               In a stationary state, number of config items oscillates between
20 ...               ${CONFIGURED_DEVICES_LIMIT} and 1 + ${CONFIGURED_DEVICES_LIMIT}.
21 ...
22 ...               The only tested HA event so far is reboot of the member
23 ...               which is Owner of netconf topology-manager entity.
24 ...               This suite assumes the Owner and the Leader are not co-located.
25 ...
26 ...               Number of devices is configurable, wait times are computed from that,
27 ...               as it takes some time to initialize connections.
28 ...               Ideally, the utility should go through half of devices during Owner downtime.
29 ...
30 ...               If there is a period when netconf manager ignores deletions in config datastore,
31 ...               the devices created previously could "leak", meaning the number of
32 ...               netconf topology items could be higher than 1 + ${CONFIGURED_DEVICES_LIMIT}.
33 ...
34 ...               One check for correctness is the final number of devices in operational netconf topology.
35 ...               Another check is performed on utility output.
36 ...
37 ...               Performance can be estimated by the total number of requests processed,
38 ...               but this suite does not perform such a computation.
39 ...
40 ...               TODO: After stopping utility, wait to see mount has succeeded on the devices.
41 Suite Setup       Setup_Everything
42 Suite Teardown    Teardown_Everything
43 Test Setup        SetupUtils.Setup_Test_With_Logging_And_Without_Fast_Failing
44 Test Teardown     ${DEFAULT_TEARDOWN_KEYWORD}
45 Default Tags      @{TAGS_CRITICAL}
46 Library           OperatingSystem
47 Library           SSHLibrary    timeout=10s
48 Library           String    # for Get_Regexp_Matches
49 Resource          ${CURDIR}/../../../libraries/ClusterManagement.robot
50 Resource          ${CURDIR}/../../../libraries/KarafKeywords.robot
51 Resource          ${CURDIR}/../../../libraries/NetconfKeywords.robot
52 Resource          ${CURDIR}/../../../libraries/SetupUtils.robot
53 Resource          ${CURDIR}/../../../libraries/SSHKeywords.robot
54 Resource          ${CURDIR}/../../../libraries/TemplatedRequests.robot
55 Resource          ${CURDIR}/../../../libraries/Utils.robot
56 Variables         ${CURDIR}/../../../variables/Variables.py
57
58 *** Variables ***
59 ${CONFIGURED_DEVICES_LIMIT}    20
60 ${CONNECTION_SLEEP}    1.2
61 ${DEFAULT_TEARDOWN_KEYWORD}    SetupUtils.Teardown_Test_Show_Bugs_If_Test_Failed
62 ${DEVICE_BASE_NAME}    netconf-test-device
63 ${DEVICE_SET_SIZE}    30
64 @{TAGS_CRITICAL}    critical    @{TAGS_NONCRITICAL}
65 @{TAGS_NONCRITICAL}    clustering    netconf
66
67 *** Test Cases ***
68 Locate_Managers
69     [Documentation]    Detect location of Leader and Owner and store related data into suite variables.
70     ...    This cannot be part of Suite Setup, as Utils.Get_Index_From_List_Of_Dictionaries calls BuiltIn.Set_Test_Variable.
71     ...    WUKS are used, as location failures are probably due to booting process, not bugs.
72     ${topology_config_leader_index}    ${candidates} =    BuiltIn.Wait_Until_Keyword_Succeeds    3x    2s    ClusterManagement.Get_Leader_And_Followers_For_Shard    shard_name=topology
73     ...    shard_type=config
74     BuiltIn.Set_Suite_Variable    \${topology_config_leader_index}
75     ${topology_config_leader_ip} =    ClusterManagement.Resolve_Ip_Address_For_Member    ${topology_config_leader_index}
76     BuiltIn.Set_Suite_Variable    \${topology_config_leader_ip}
77     ${topology_config_leader_http_session} =    Resolve_Http_Session_For_Member    ${topology_config_leader_index}
78     BuiltIn.Set_Suite_Variable    \${topology_config_leader_http_session}
79     ${netconf_manager_owner_index}    ${candidates} =    BuiltIn.Wait_Until_Keyword_Succeeds    3x    2s    ClusterManagement.Get_Owner_And_Candidates_For_Type_And_Id    type=topology-netconf
80     ...    id=/general-entity:entity[general-entity:name='topology-manager']    member_index=1
81     BuiltIn.Set_Suite_Variable    \${netconf_manager_owner_index}
82     ${netconf_manager_owner_ip} =    ClusterManagement.Resolve_Ip_Address_For_Member    ${netconf_manager_owner_index}
83     BuiltIn.Set_Suite_Variable    \${netconf_manager_owner_ip}
84     ${netconf_manager_owner_http_session} =    Resolve_Http_Session_For_Member    ${netconf_manager_owner_index}
85     BuiltIn.Set_Suite_Variable    \${netconf_manager_owner_http_session}
86
87 Start_Testtool
88     [Documentation]    Deploy and start test tool on its separate SSH session.
89     SSHLibrary.Switch_Connection    ${testtool_connection_index}
90     NetconfKeywords.Install_And_Start_Testtool    device-count=${DEVICE_SET_SIZE}    schemas=${CURDIR}/../../../variables/netconf/CRUD/schemas
91     # TODO: Introduce NetconfKeywords.Safe_Install_And_Start_Testtool to avoid teardown maniputation.
92     [Teardown]    BuiltIn.Run_Keywords    SSHLibrary.Switch_Connection    ${configurer_connection_index}
93     ...    AND    ${DEFAULT_TEARDOWN_KEYWORD}
94
95 Start_Configurer
96     [Documentation]    Launch Python utility (while copying output to log file) and verify it does not stop by itself.
97     ${log_filename} =    Utils.Get_Log_File_Name    configurer
98     BuiltIn.Set_Suite_Variable    \${log_filename}
99     # TODO: Should things like restconf port/user/password be set from Variables?
100     ${command} =    BuiltIn.Set_Variable    python configurer.py --odladdress ${topology_config_leader_ip} --deviceaddress ${TOOLS_SYSTEM_IP} --devices ${DEVICE_SET_SIZE} --disconndelay ${CONFIGURED_DEVICES_LIMIT} --basename ${DEVICE_BASE_NAME} --connsleep ${CONNECTION_SLEEP} &> "${log_filename}"
101     SSHLibrary.Write    ${command}
102     ${status}    ${text} =    BuiltIn.Run_Keyword_And_Ignore_Error    SSHLibrary.Read_Until_Prompt
103     BuiltIn.Log    ${text}
104     BuiltIn.Run_Keyword_If    "${status}" != "FAIL"    BuiltIn.Fail    Prompt happened, see Log.
105     # Session is kept active.
106
107 Wait_For_Config_Items
108     [Documentation]    Make sure configurer is in phase when old devices are being deconfigured; or fail on timeout.
109     ${timeout} =    Get_Typical_Time
110     BuiltIn.Wait_Until_Keyword_Succeeds    ${timeout}    1s    Check_Config_Items_Lower_Bound
111
112 Reboot_Manager_Owner
113     [Documentation]    Kill and restart member where netconf topology manager was, including removal of persisted data.
114     ...    After cluster sync, sleep additional time to ensure manager processes requests with the rebooted member fully rejoined.
115     [Tags]    @{TAGS_NONCRITICAL}    # To avoid long WUKS list expanded in log.html
116     ClusterManagement.Kill_Single_Member    ${netconf_manager_owner_index}
117     # TODO: Introduce ClusterManagement.Clean_Journals_And_Snapshots_On_Single_Member
118     ${owner_list} =    BuiltIn.Create_List    ${netconf_manager_owner_index}
119     ClusterManagement.Clean_Journals_And_Snapshots_On_List_Or_All    ${owner_list}
120     ClusterManagement.Start_Single_Member    ${netconf_manager_owner_index}
121     BuiltIn.Comment    FIXME: Replace sleep with WUKS when it becomes clear what to wait for.
122     ${sleep_time} =    Get_Typical_Time    coefficient=3.0
123     BuiltIn.Sleep    ${sleep_time}
124
125 Stop_Configurer
126     [Documentation]    Write ctrl+c, download the log, read its contents and match expected patterns.
127     Utils.Write_Bare_Ctrl_C
128     ${output} =    SSHLibrary.Read_Until_Prompt
129     BuiltIn.Log    ${output}
130     SSHLibrary.Get_File    ${log_filename}
131     ${output} =    OperatingSystem.Get_File    ${log_filename}
132     ${list_any_matches} =    String.Get_Regexp_Matches    ${output}    delete|put
133     ${number_any_matches} =    BuiltIn.Get_Length    ${list_any_matches}
134     BuiltIn.Should_Be_Equal    ${2}    ${number_any_matches}    Unexpected status seen: ${output}
135     ${list_strict_matches} =    String.Get_Regexp_Matches    ${output}    delete:200|put:201
136     ${number_strict_matches} =    BuiltIn.Get_Length    ${list_strict_matches}
137     BuiltIn.Should_Be_Equal    ${2}    ${number_strict_matches}    Expected status not seen: ${output}
138
139 Check_For_Connector_Leak
140     [Documentation]    Check that number of items in operational netconf topology is not higher than expected.
141     # FIXME: Are separate keywords necessary?
142     Check_Operational_Items_Upper_Bound
143
144 *** Keywords ***
145 Setup_Everything
146     [Documentation]    Initialize libraries and set suite variables..
147     ClusterManagement.ClusterManagement_Setup
148     SetupUtils.Setup_Utils_For_Setup_And_Teardown
149     NetconfKeywords.Setup_Netconf_Keywords    create_session_for_templated_requests=False
150     ${testtool_connection_index} =    SSHKeywords.Open_Connection_To_Tools_System
151     BuiltIn.Set_Suite_Variable    \${testtool_connection_index}
152     ${configurer_connection_index} =    SSHKeywords.Open_Connection_To_Tools_System
153     BuiltIn.Set_Suite_Variable    \${configurer_connection_index}
154     SSHKeywords.Require_Python
155     SSHKeywords.Assure_Library_Counter
156     SSHLibrary.Put_File    ${CURDIR}/../../../../tools/netconf_tools/configurer.py
157     SSHLibrary.Put_File    ${CURDIR}/../../../libraries/AuthStandalone.py
158
159 Teardown_Everything
160     [Documentation]    Teardown the test infrastructure, perform cleanup and release all resources.
161     SSHLibrary.Switch_Connection    ${testtool_connection_index}
162     NetconfKeywords.Stop_Testtool
163     RequestsLibrary.Delete_All_Sessions
164
165 Count_Substring_Occurence
166     [Arguments]    ${substring}    ${main_string}
167     [Documentation]    Apply the length_of_split method for counting how many times ${substring} occures within ${main_string}.
168     ...    The method is reliable only if triple-double quotes are not present in either argument.
169     BuiltIn.Comment    TODO: Migrate this keyword into an appropriate Resource.
170     BuiltIn.Run_Keyword_And_Return    Builtin.Evaluate    len("""${main_string}""".split("""${substring}""")) - 1
171
172 Get_Config_Device_Count
173     [Documentation]    Count number of items in config netconf topology matching ${DEVICE_BASE_NAME}
174     ${item_data} =    TemplatedRequests.Get_As_Json_From_Uri    ${CONFIG_API}/network-topology:network-topology/topology/topology-netconf    session=${topology_config_leader_http_session}
175     BuiltIn.Run_Keyword_And_Return    Count_Substring_Occurence    substring=${DEVICE_BASE_NAME}    main_string=${item_data}
176
177 Get_Operational_Device_Count
178     [Documentation]    Count number of items in operational netconf topology matching ${DEVICE_BASE_NAME}
179     ${item_data} =    TemplatedRequests.Get_As_Json_From_Uri    ${OPERATIONAL_API}/network-topology:network-topology/topology/topology-netconf    session=${topology_config_leader_http_session}
180     BuiltIn.Run_Keyword_And_Return    Count_Substring_Occurence    substring=${DEVICE_BASE_NAME}    main_string=${item_data}
181
182 Check_Config_Items_Lower_Bound
183     [Documentation]    Count items matching ${DEVICE_BASE_NAME}, fail if less than ${CONFIGURED_DEVICES_LIMIT}
184     ${device_count} =    Get_Config_Device_Count
185     BuiltIn.Run_Keyword_If    ${device_count} < ${CONFIGURED_DEVICES_LIMIT}    BuiltIn.Fail    Found ${device_count} config items, should be at least ${CONFIGURED_DEVICES_LIMIT}
186
187 Check_Operational_Items_Upper_Bound
188     [Documentation]    Count items matching ${DEVICE_BASE_NAME}, fail if more than 1 + ${CONFIGURED_DEVICES_LIMIT}
189     ${device_count} =    Get_Operational_Device_Count
190     BuiltIn.Run_Keyword_If    ${device_count} > 1 + ${CONFIGURED_DEVICES_LIMIT}    BuiltIn.Fail    Found ${device_count} config items, should be at most 1 + ${CONFIGURED_DEVICES_LIMIT}
191
192 Get_Typical_Time
193     [Arguments]    ${coefficient}=1.0
194     [Documentation]    Return number of seconds typical for given scale variables.
195     BuiltIn.Run_Keyword_And_Return    BuiltIn.Evaluate    ${coefficient} * ${CONNECTION_SLEEP} * ${CONFIGURED_DEVICES_LIMIT}