NETVIRT-1519: MIP entry duplicated in FIB
Issue:
VRRP is configured for MIP-IP on two VMs. After cluster reboot,
VRRP-standby VM becomes a master for fraction of second and triggers
many GARP request for MIP-IP to controller. In controller ARP
NotificationHandler , when GARP is received , still learntVpnIpToPort
is not populated.Hence new adjacency gets added onto secondary VM's
VpnInterface without deleting the adj on original master VM's
VpnInterface. When cluster reboot completes , Arp Responses are
received from original master VM. But LearntVpnIpToPort has MIP-IP
with secondary VM portname. Now MIP-IP is removed from secondary
VM's VpnInterface Adj and FIB is also updated. Route pointing to
nexthop as secondary VM is withdrawn from DC-GW. Finally traffic
starts dropping.
Fixes:
To know who held the MIP-IP before cluster reboot, its decided to
persist the MIP-IP info. Hence VpnIpToPort config datastore is chosen
to store the info. After cluster reboot, when flood of GARP is
received, it will be checked if already an entry is present in
VpnIpToPort. If present and GARP is from different VM Interface,
oldPort entry is removed.
During cluster reboot, due to an ELAN bug in its pipelines, both the
Master and Slave VRRP VNFs go into Split brain and both of them own
the same MIP and respond to ARP requests as though they are both
legitimate owners. In order for this to not cause any damage to
L3VPN MIP FIB entry, we have introduced a quiescent ("quiet period")
of 300 seconds i.e., 5 minutes
within the ARP NotificationHandler as soon as its Constructed.
During this quiescent period, we will not be permitting re-learning of
any existing MIPs. This quiescent period for each existing MIPs (ie.,
MIPs that were learnt before reboot) is to resolve their split brain
issues and thereby settle down between themselves to one Master.
After the quiescent period is over, we allow re-learning of all these
existing MIPs.
Please note that the quiescent period is applicable for
only existing MIPs (that were prior to cluster reboot/cluster upgrade)
and so the period is not applicable for learning new MIPs. New MIPs
can be learned instantly after the cluster reboot completes.
We have introduced boot-delay-arp-learning parameter in the VpnConfig
for use by the controller orchestrator. The boot-delay-arp-learning
parameter controls for how much time after bootup, should the
arp-learning be made quiescent in seconds.
Change-Id: I2985480050cb0b8ee8434cced0074abb7a05a5cd
Signed-off-by: Anil Kumar Gujele <anilkumar.g@altencalsoftlabs.com>