1 .. contents:: Table of Contents
4 ================================================
5 VNI based L2 switching, L3 forwarding and NATing
6 ================================================
8 https://git.opendaylight.org/gerrit/#/q/topic:vni-based-l2-l3-nat
10 **Important**: All gerrit links raised for this feature will have topic name as **vni-based-l2-l3-nat**
12 This feature attempts to realize the use of VxLAN VNI (Virtual Network Identifier) for VxLAN
13 tenant traffic flowing on the cloud data-network. This is applicable to L2 switching, L3
14 forwarding and NATing for all VxLAN based provider networks. In doing so, it eliminates the
15 presence of ``LPort tags``, ``ELAN tags`` and ``MPLS labels`` on the wire and instead, replaces
16 them with VNIs supplied by the tenant's OpenStack.
18 This will be selectively done for the use-cases covered by this spec and hence, its
19 implementation won't completely remove the usage of the above entities. The usage of ``LPort tags``
20 and ``ELAN tags`` within an OVS datapath (not on the wire) of the hypervisor will be retained, as
21 eliminating it completely is a large redesign and can be pursued incrementally later.
23 This spec is the first step in the direction of enforcing datapath semantics that uses tenant
24 supplied VNI values on VxLAN Type networks created by tenants in OpenStack Neutron.
26 **Note**: The existing L3 BGPVPN control-path and data-path semantics will continue to use L3
27 labels on the wire as well as inside the OVS datapaths of the hypervisor to realize both intra-dc
28 and inter-dc connectivity.
34 OpenDaylight NetVirt service today supports the following types of networks:
41 Amongst these, VxLAN-based overlay is supported only for traffic within the DataCenter. External
42 network accesses over the DC-Gateway are supported via VLAN or GRE type external networks.
43 For rest of the traffic over the DC-Gateway, the only supported overlay is GRE.
45 Today, for VxLAN enabled networks by the tenant, the labels are generated by L3 forwarding service
46 and used. Such labels are re-used for inter-DC use-cases with BGPVPN as well. This does not honor
47 and is not in accordance with the datapath semantics from an orchestration point of view.
49 **This spec attempts to change the datapath semantics by enforcing the VNIs** (unique for every VxLAN
50 enabled network in the cloud) **as dictated by the tenant's OpenStack configuration for L2**
51 **switching, L3 forwarding and NATing**.
53 This implementation will remove the reliance on using the following (on the wire) within the
56 * Labels for L3 forwarding
57 * LPort tags for L2 switching
59 More specifically, the traffic from source VM will be routed in source OVS by the L3VPN / ELAN
60 pipeline. After that, the packet will travel as a switched packet in the VxLAN underlay within the
61 DC, containing the VNI in the VxLAN header instead of MPLS label / LPort tag. In the destination
62 OVS, the packet will be collected and sent to the destination VM through the existing ELAN
65 In the nodes themselves, the LPort tag will continue to be used when pushing the packet from
66 ELAN / L3VPN pipeline towards the VM as ACLService continues to use ``LPort tags``.
68 Simiarly ``ELAN tags`` will continue to be used for handling L2 broadcast packets:
70 * locally generated in the OVS datapath
71 * remotely received from another OVS datapath via internal VxLAN tunnels
73 LPort tag uses 8 bits and ELAN tag uses 21 bits in the metadata. The existing use of both in the
74 metadata will remain unaffected.
78 Since VNIs are provisioned only for VxLAN based underlays, this feature has in its scope the
79 use-cases pertaining to **intra-DC connectivity over internal VxLAN tunnels only**.
81 On the cloud data network wire, all the VxLAN traffic for basic L2 switching within a VxLAN
82 network and L3 forwarding across VxLAN-type networks using routers will use tenant supplied VNI
83 values for such VXLAN networks.
85 Inter-DC connectivity over external VxLAN tunnels is covered by the EVPN_RT5_ spec.
90 * Complete removal of use of ``LPort tags`` everywhere in ODL: Use of ``LPort tags`` within the OVS
91 Datapath of a hypervisor, for streaming traffic to the right virtual endpoint on that hypervisor
92 (note: not on the wire) will be retained
93 * Complete removal of use of ``ELAN tags`` everywhere in ODL: Use of ``ELAN tags`` within the OVS
94 Datapath to handle local/remote L2 broadcasts (note: not on the wire) will be retained
95 * Complete removal of use of ``MPLS labels`` everywhere in ODL: Use of ``MPLS labels`` for
96 realizing an L3 BGPVPN (regardless of type of networks put into such BGPVPN that may include
97 networks of type VxLAN) both on the wire and within the OVS Datapaths will be retained.
98 * Addressing or testing IPv6 use-cases
99 * Intra DC NAT usecase where no explicit Internet VPN is created for VxLAN based external provider
100 networks: Detailed further in Intra DC subsection in NAT section below.
102 Complete removal of use of ``LPort tags``, ``ELAN tags`` and ``MPLS labels`` for VxLAN-type
103 networks has large scale design/pipeline implications and thus need to be attempted as future
104 initiatives via respective specs.
108 This feature involves amendments/testing pertaining to the following:
110 L2 switching use cases
111 ++++++++++++++++++++++
113 #. L2 Unicast frames exchanged within an OVS datapath
114 #. L2 Unicast frames exchanged over OVS datapaths that are on different hypervisors
115 #. L2 Broadcast frames transmitted within an OVS datapath
116 #. L2 Broadcast frames received from remote OVS datapaths
118 L3 forwarding use cases
119 +++++++++++++++++++++++
121 #. Router realized using VNIs for networks attached to a new router (with network having
123 #. Router realized using VNIs for networks attached to a new router (with new VMs booted later on
125 #. Router updated with one or more extra route(s) to an existing VM.
126 #. Router updated to remove previously added one/more extra routes.
132 The provider network types for external networks supported today are:
134 * External VLAN Provider Networks (transparent Internet VPN)
135 * External Flat Networks (transparent Internet VPN)
136 * Tenant-orchestrated Internet VPN of type GRE (actually MPLSOverGRE)
138 Following are the SNAT/DNAT use-cases applicable to the network types listed above:
140 #. SNAT functionality.
141 #. DNAT functionality.
142 #. DNAT to DNAT functionality (Intra DC)
144 * FIP VM to FIP VM on same hypervisor
145 * FIP VM to FIP VM on different hypervisors
147 #. SNAT to DNAT functionality (Intra DC)
149 * Non-FIP VM to FIP VM on the same NAPT hypervisor
150 * Non-FIP VM to FIP VM on the same hypervisor, but NAPT on different hypervisor
151 * Non-FIP VM to FIP VM on different hypervisors (with NAPT on FIP VM hypervisor)
152 * Non-FIP VM to FIP VM on different hypervisors (with NAPT on Non-FIP VM hypervisor)
158 The following components within OpenDaylight Controller needs to be enhanced:
162 * VPN Engine (VPN Manager, VPN Interface Manager and VPN Subnet Route Handler)
179 There are no explicit pipeline changes for this use-case.
186 Instead of setting the destination LPort tag, destination network VNI will be set in the
187 ``tun_id`` field in ``L2_DMAC_FILTER_TABLE`` (table 51) while egressing the packet on the tunnel
190 The modifications in flows and groups on the ingress OVS are illustrated below:
195 cookie=0x8000000, duration=65.484s, table=0, n_packets=23, n_bytes=2016, priority=4,in_port=6actions=write_metadata:0x30000000000/0xffffff0000000001,goto_table:17
196 cookie=0x6900000, duration=63.106s, table=17, n_packets=23, n_bytes=2016, priority=1,metadata=0x30000000000/0xffffff0000000000 actions=write_metadata:0x2000030000000000/0xfffffffffffffffe,goto_table:40
197 cookie=0x6900000, duration=64.135s, table=40, n_packets=4, n_bytes=392, priority=61010,ip,dl_src=fa:16:3e:86:59:fd,nw_src=12.1.0.4 actions=ct(table=41,zone=5002)
198 cookie=0x6900000, duration=5112.542s, table=41, n_packets=21, n_bytes=2058, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,17)
199 cookie=0x8040000, duration=62.125s, table=17, n_packets=15, n_bytes=854, priority=6,metadata=0x6000030000000000/0xffffff0000000000 actions=write_metadata:0x700003138a000000/0xfffffffffffffffe,goto_table:48
200 cookie=0x8500000, duration=5113.124s, table=48, n_packets=24, n_bytes=3044, priority=0 actions=resubmit(,49),resubmit(,50)
201 cookie=0x805138a, duration=62.163s, table=50, n_packets=15, n_bytes=854, priority=20,metadata=0x3138a000000/0xfffffffff000000,dl_src=fa:16:3e:86:59:fd actions=goto_table:51
202 cookie=0x803138a, duration=62.163s, table=51, n_packets=6, n_bytes=476, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:31:fb:91 actions=set_field:**0x710**->tun_id,output:1
207 On the egress OVS, for the packets coming in via the internal VxLAN tunnel (OVS - OVS),
208 ``INTERNAL_TUNNEL_TABLE`` currently matches on destination LPort tag for unicast packets. Since
209 the incoming packets will now contain the network VNI in the VxLAN header, the
210 ``INTERNAL_TUNNEL_TABLE`` will match on this VNI, set the ELAN tag in the metadata and forward
211 the packet to ``L2_DMAC_FILTER_TABLE`` so as to reach the destination VM via the ELAN pipeline.
213 The modifications in flows and groups on the egress OVS are illustrated below:
216 :emphasize-lines: 2-7
218 cookie=0x8000001, duration=5136.996s, table=0, n_packets=12601, n_bytes=899766, priority=5,in_port=1,actions=write_metadata:0x10000000001/0xfffff0000000001,goto_table:36
219 cookie=0x9000004, duration=1145.594s, table=36, n_packets=15, n_bytes=476, priority=5,**tun_id=0x710,actions=write_metadata:0x138a000001/0xfffffffff000000,goto_table:51**
220 cookie=0x803138a, duration=62.163s, table=51, n_packets=9, n_bytes=576, priority=20,metadata=0x138a000001/0xffff000000,dl_dst=fa:16:3e:86:59:fd actions=load:0x300->NXM_NX_REG6[],resubmit(,220)
221 cookie=0x6900000, duration=63.122s, table=220, n_packets=9, n_bytes=1160, priority=6,reg6=0x300actions=load:0x70000300->NXM_NX_REG6[],write_metadata:0x7000030000000000/0xfffffffffffffffe,goto_table:251
222 cookie=0x6900000, duration=65.479s, table=251, n_packets=8, n_bytes=392, priority=61010,ip,dl_dst=fa:16:3e:86:59:fd,nw_dst=12.1.0.4 actions=ct(table=252,zone=5002)
223 cookie=0x6900000, duration=5112.299s, table=252, n_packets=19, n_bytes=1862, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,220)
224 cookie=0x8000007, duration=63.123s, table=220, n_packets=8, n_bytes=1160, priority=7,reg6=0x70000300actions=output:6
233 The ARP broadcast by the VM will be a (local + remote) broadcast.
235 For the local broadcast on the VM's OVS itself, the packet will continue to get flooded to all the
236 VM ports by setting the destination LPort tag in the local broadcast group. Hence, there are no
237 explicit pipeline changes for when a packet is transmitted within the source OVS via a local
240 The changes in pipeline for the remote broadcast are illustrated below:
245 Instead of setting the ELAN tag, network VNI will be set in the ``tun_id`` field as part of
246 bucket actions in remote broadcast group while egressing the packet on the tunnel port.
248 The modifications in flows and groups on the ingress OVS are illustrated below:
253 cookie=0x8000000, duration=65.484s, table=0, n_packets=23, n_bytes=2016, priority=4,in_port=6actions=write_metadata:0x30000000000/0xffffff0000000001,goto_table:17
254 cookie=0x6900000, duration=63.106s, table=17, n_packets=23, n_bytes=2016, priority=1,metadata=0x30000000000/0xffffff0000000000 actions=write_metadata:0x2000030000000000/0xfffffffffffffffe,goto_table:40
255 cookie=0x6900000, duration=64.135s, table=40, n_packets=4, n_bytes=392, priority=61010,ip,dl_src=fa:16:3e:86:59:fd,nw_src=12.1.0.4 actions=ct(table=41,zone=5002)
256 cookie=0x6900000, duration=5112.542s, table=41, n_packets=21, n_bytes=2058, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,17)
257 cookie=0x8040000, duration=62.125s, table=17, n_packets=15, n_bytes=854, priority=6,metadata=0x6000030000000000/0xffffff0000000000 actions=write_metadata:0x700003138a000000/0xfffffffffffffffe,goto_table:48
258 cookie=0x8500000, duration=5113.124s, table=48, n_packets=24, n_bytes=3044, priority=0 actions=resubmit(,49),resubmit(,50)
259 cookie=0x805138a, duration=62.163s, table=50, n_packets=15, n_bytes=854, priority=20,metadata=0x3138a000000/0xfffffffff000000,dl_src=fa:16:3e:86:59:fd actions=goto_table:51
260 cookie=0x8030000, duration=5112.911s, table=51, n_packets=18, n_bytes=2568, priority=0 actions=goto_table:52
261 cookie=0x870138a, duration=62.163s, table=52, n_packets=9, n_bytes=378, priority=5,metadata=0x138a000000/0xffff000001 actions=write_actions(group:210004)
263 group_id=210004,type=all,bucket=actions=group:210003,bucket=actions=set_field:**0x710**->tun_id,output:1
268 On the egress OVS, for the packets coming in via the internal VxLAN tunnel (OVS - OVS),
269 ``INTERNAL_TUNNEL_TABLE`` currently matches on ELAN tag for broadcast packets. Since the
270 incoming packets will now contain the network VNI in the VxLAN header, the
271 ``INTERNAL_TUNNEL_TABLE`` will match on this VNI, set the ELAN tag in the metadata and forward
272 the packet to ``L2_DMAC_FILTER_TABLE`` to be broadcasted via the local broadcast groups
273 traversing the ELAN pipeline.
275 The ``TUNNEL_INGRESS_BIT`` being set in the ``CLASSIFIER_TABLE`` (table 0) ensures that the
276 packet is always sent to the local broadcast group only and hence, remains within the OVS. This
277 is necessary to avoid switching loop back to the source OVS.
279 The modifications in flows and groups on the egress OVS are illustrated below:
282 :emphasize-lines: 2-12
284 cookie=0x8000001, duration=5136.996s, table=0, n_packets=12601, n_bytes=899766, priority=5,in_port=1,actions=write_metadata:0x10000000001/0xfffff0000000001,goto_table:36
285 cookie=0x9000004, duration=1145.594s, table=36, n_packets=15, n_bytes=476, priority=5,**tun_id=0x710,actions=write_metadata:0x138a000001/0xfffffffff000000,goto_table:51**
286 cookie=0x8030000, duration=5137.609s, table=51, n_packets=9, n_bytes=1293, priority=0 actions=goto_table:52
287 cookie=0x870138a, duration=1145.592s, table=52, n_packets=0, n_bytes=0, priority=5,metadata=0x138a000001/0xffff000001 actions=apply_actions(group:210003)
289 group_id=210003,type=all,bucket=actions=set_field:0x4->tun_id,resubmit(,55)
291 cookie=0x8800004, duration=1145.594s, table=55, n_packets=9, n_bytes=378, priority=9,tun_id=0x4,actions=load:0x400->NXM_NX_REG6[],resubmit(,220)
292 cookie=0x6900000, duration=63.122s, table=220, n_packets=9, n_bytes=1160, priority=6,reg6=0x300actions=load:0x70000300->NXM_NX_REG6[],write_metadata:0x7000030000000000/0xfffffffffffffffe,goto_table:251
293 cookie=0x6900000, duration=65.479s, table=251, n_packets=8, n_bytes=392, priority=61010,ip,dl_dst=fa:16:3e:86:59:fd,nw_dst=12.1.0.4 actions=ct(table=252,zone=5002)
294 cookie=0x6900000, duration=5112.299s, table=252, n_packets=19, n_bytes=1862, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,220)
295 cookie=0x8000007, duration=63.123s, table=220, n_packets=8, n_bytes=1160, priority=7,reg6=0x70000300actions=output:6
298 The ARP response will be a unicast packet, and as indicated above, for unicast packets, there
299 are no explicit pipeline changes.
305 Between VMs on a single OVS
306 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
308 There are no explicit pipeline changes for this use-case.
309 The destination LPort tag will continue to be set in the nexthop group since when
310 ``The EGRESS_DISPATCHER_TABLE`` sends the packet to ``EGRESS_ACL_TABLE``, it is used by the ACL
313 Between VMs on two different OVS
314 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
316 L3 forwarding between VMs on two different hypervisors is asymmetric forwarding since the traffic
317 is routed in the source OVS datapath while it is switched over the wire and then all the way to
318 the destination VM on the destination OVS datapath.
320 VM sourcing the traffic (Ingress OVS)
321 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
323 ``L3_FIB_TABLE`` will set the destination network VNI in the ``tun_id`` field instead of the MPLS
329 CLASSIFIER_TABLE => DISPATCHER_TABLE => INGRESS_ACL_TABLE =>
330 DISPATCHER_TABLE => L3_GW_MAC_TABLE =>
331 L3_FIB_TABLE (set destination MAC, **set tunnel-ID as destination network VNI**)
332 => Output to tunnel port
334 The modifications in flows and groups on the ingress OVS are illustrated below:
339 cookie=0x8000000, duration=128.140s, table=0, n_packets=25, n_bytes=2716, priority=4,in_port=5 actions=write_metadata:0x50000000000/0xffffff0000000001,goto_table:17
340 cookie=0x8000000, duration=4876.599s, table=17, n_packets=0, n_bytes=0, priority=0,metadata=0x5000000000000000/0xf000000000000000 actions=write_metadata:0x6000000000000000/0xf000000000000000,goto_table:80
341 cookie=0x1030000, duration=4876.563s, table=80, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17)
342 cookie=0x6900000, duration=123.870s, table=17, n_packets=25, n_bytes=2716, priority=1,metadata=0x50000000000/0xffffff0000000000 actions=write_metadata:0x2000050000000000/0xfffffffffffffffe,goto_table:40
343 cookie=0x6900000, duration=126.056s, table=40, n_packets=15, n_bytes=1470, priority=61010,ip,dl_src=fa:16:3e:63:ea:0c,nw_src=10.1.0.4 actions=ct(table=41,zone=5001)
344 cookie=0x6900000, duration=4877.057s, table=41, n_packets=17, n_bytes=1666, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,17)
345 cookie=0x6800001, duration=123.485s, table=17, n_packets=28, n_bytes=3584, priority=2,metadata=0x2000050000000000/0xffffff0000000000 actions=write_metadata:0x5000050000000000/0xfffffffffffffffe,goto_table:60
346 cookie=0x6800000, duration=3566.900s, table=60, n_packets=24, n_bytes=2184, priority=0 actions=resubmit(,17)
347 cookie=0x8000001, duration=123.456s, table=17, n_packets=17, n_bytes=1554, priority=5,metadata=0x5000050000000000/0xffffff0000000000 actions=write_metadata:0x60000500000222e0/0xfffffffffffffffe,goto_table:19
348 cookie=0x8000009, duration=124.815s, table=19, n_packets=15, n_bytes=1470, priority=20,metadata=0x222e0/0xfffffffe,dl_dst=fa:16:3e:51:da:ee actions=goto_table:21
349 cookie=0x8000003, duration=125.568s, table=21, n_packets=9, n_bytes=882, priority=42,ip,metadata=0x222e0/0xfffffffe,nw_dst=12.1.0.3 actions=**set_field:0x710->tun_id**,set_field:fa:16:3e:31:fb:91->eth_dst,output:1
351 VM receiving the traffic (Egress OVS)
352 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
354 On the egress OVS, for the packets coming in via the VxLAN tunnel, ``INTERNAL_TUNNEL_TABLE``
355 currently matches on MPLS label and sends it to the nexthop group to be taken to the destination
356 VM via ``EGRESS_ACL_TABLE``.
357 Since the incoming packets will now contain network VNI in the VxLAN header, the ``INTERNAL_TUNNEL_TABLE``
358 will match on the VNI, set the ELAN tag in the metadata and forward the packet to
359 ``L2_DMAC_FILTER_TABLE``, from where it will be taken to the destination VM via the ELAN pipeline.
364 CLASSIFIER_TABLE => INTERNAL_TUNNEL_TABLE (Match on network VNI, set ELAN tag in the metadata)
365 => L2_DMAC_FILTER_TABLE (Match on destination MAC) => EGRESS_DISPATCHER_TABLE
366 => EGRESS_ACL_TABLE => Output to destination VM port
368 The modifications in flows and groups on the egress OVS are illustrated below:
371 :emphasize-lines: 2-7
373 cookie=0x8000001, duration=4918.647s, table=0, n_packets=12292, n_bytes=877616, priority=5,in_port=1actions=write_metadata:0x10000000001/0xfffff0000000001,goto_table:36
374 cookie=0x9000004, duration=927.245s, table=36, n_packets=8234, n_bytes=52679, priority=5,**tun_id=0x710,actions=write_metadata:0x138a000001/0xfffffffff000000,goto_table:51**
375 cookie=0x803138a, duration=62.163s, table=51, n_packets=9, n_bytes=576, priority=20,metadata=0x138a000001/0xffff000000,dl_dst=fa:16:3e:86:59:fd actions=load:0x300->NXM_NX_REG6[],resubmit(,220)
376 cookie=0x6900000, duration=63.122s, table=220, n_packets=9, n_bytes=1160, priority=6,reg6=0x300actions=load:0x70000300->NXM_NX_REG6[],write_metadata:0x7000030000000000/0xfffffffffffffffe,goto_table:251
377 cookie=0x6900000, duration=65.479s, table=251, n_packets=8, n_bytes=392, priority=61010,ip,dl_dst=fa:16:3e:86:59:fd,nw_dst=12.1.0.4 actions=ct(table=252,zone=5002)
378 cookie=0x6900000, duration=5112.299s, table=252, n_packets=19, n_bytes=1862, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,220)
379 cookie=0x8000007, duration=63.123s, table=220, n_packets=8, n_bytes=1160, priority=7,reg6=0x70000300actions=output:6
384 For NAT, we need VNIs to be used in two scenarios:
386 * When packet is forwarded from non-NAPT to NAPT hypervisor (VNI per router)
387 * Between hypervisors (intra DC) over Internet VPN (VNI per Internet VPN)
389 Hence, a pool titled ``opendaylight-vni-ranges``, non-overlapping with the OpenStack Neutron
390 vni_ranges configuration, needs to be configured by the OpenDaylight Controller Administrator.
392 This ``opendaylight-vni-ranges`` pool will be used to carve out a unique VNI per router to be then
393 used in the datapath for traffic forwarding from non-NAPT to NAPT switch for this router.
395 Similarly, for MPLSOverGRE based external networks, the ``opendaylight-vni-ranges`` pool will be
396 used to carve out a unique VNI per Internet VPN (GRE-provider-type) to be then used in the
397 datapath for traffic forwarding for ``SNAT-to-DNAT`` and ``DNAT-to-DNAT`` cases within the
398 DataCenter. Only one external network can be associated to Internet VPN today and this spec
399 doesn't attempt to address that limitation.
401 A NeutronVPN configuration API will be exposed to the administrator to configure the lower and
402 higher limit for this pool.
403 If the administrator doesn’t configure this explicitly, then the pool will be created with default
404 values of lower limit set to 70000 and upper limit set to 100000, during the first NAT session
407 **FIB Manager changes**: For external network of type GRE, it is required to use
408 ``Internet VPN VNI`` for intra-DC communication, but we still require ``MPLS labels`` to reach
409 SNAT/DNAT VMs from external entities via MPLSOverGRE. Hence, we will make use of the ``l3vni``
410 attribute added to fibEntries container as part of EVPN_RT5_ spec. NAT will populate both
411 ``label`` and ``l3vni`` values for fibEntries created for floating-ips and external-fixed-ips with
412 external network of type GRE. This ``l3vni`` value will be used while programming remote FIB flow
413 entries (on all the switches which are part of the same VRF). But still, MPLS label will be used
414 to advertise prefixes and in ``L3_LFIB_TABLE`` taking the packet to ``INBOUND_NAPT_TABLE`` and
417 For SNAT/DNAT use-cases, we have following provider network types for External Networks:
419 #. VLAN - not VNI based
420 #. Flat - not VNI based
421 #. VxLAN - VNI based (covered by the EVPN_RT5_ spec)
422 #. GRE - not VNI based (will continue to use ``MPLS labels``)
430 * From a VM on a NAPT switch to reach Internet, and reverse traffic reaching back to the VM
432 There are no explicit pipeline changes.
434 * From a VM on a non-NAPT switch to reach Internet, and reverse traffic reaching back to the VM
436 On the non-NAPT switch, ``PSNAT_TABLE`` (table 26) will be set with ``tun_id`` field as
437 ``Router Based VNI`` allocated from the pool and send to group to reach NAPT switch.
439 On the NAPT switch, ``INTERNAL_TUNNEL_TABLE`` (table 36) will match on the ``tun_id`` field
440 which will be ``Router Based VNI`` and send the packet to ``OUTBOUND_NAPT_TABLE`` (table 46) for
441 SNAT Translation and to be taken to Internet.
448 cookie=0x8000006, duration=2797.179s, table=26, n_packets=47, n_bytes=3196, priority=5,ip,metadata=0x23a50/0xfffffffe actions=**set_field:0x710->tun_id**,group:202501
450 group_id=202501,type=all,bucket=actions=output:1
457 cookie=0x8000001, duration=4918.647s, table=0, n_packets=12292, n_bytes=877616, priority=5,in_port=1,actions=write_metadata:0x10000000001/0xfffff0000000001,goto_table:36
458 cookie=0x9000004, duration=927.245s, table=36, n_packets=8234, n_bytes=52679, priority=10,ip,**tun_id=0x710**,actions=write_metadata:0x23a50/0xfffffffe,goto_table:46
460 As part of the response from NAPT switch, the packet will be taken to the Non-NAPT switch
461 after SNAT reverse translation using destination VMs Network VNI.
467 There is no NAT specific explicit pipeline change for DNAT traffic to DC-gateway.
472 * VLAN Provider External Networks: VNI is not applicable on the external VLAN Provider network.
473 However, the Router VNI will be used for datapath traffic from non-NAPT switch to NAPT-switch
474 over the internal VxLAN tunnel.
476 * VxLAN Provider External Networks:
478 + **Explicit creation of Internet VPN**: An L3VNI, mandatorily falling within the
479 ``opendaylight-vni-ranges``, will be provided by the Cloud admin (or tenant). This VNI will be
480 used uniformly for all packet transfer over the VxLAN wire for this Internet VPN (uniformly
481 meaning all the traffic on Internal or External VXLAN Tunnel, except the non-NAPT to NAPT
482 communication). This usecase is covered by EVPN_RT5_ spec
484 + **No explicit creation of Internet VPN**: A transparent Internet VPN having UUID same as that
485 of the corresponding external network UUID is created implicitly and the VNI configured for
486 this external network should be used on the VxLAN wire. This usecase is **out of scope** from
487 the perspective of this spec, and the same is indicated in `Out of Scope`_ section.
489 * GRE Provider External Networks: ``Internet VPN VNI`` will be carved per Internet VPN using
490 ``opendaylight-vni-ranges`` to be used on the wire.
495 * FIP VM to FIP VM on different hypervisors
497 After DNAT translation on the first hypervisor ``DNAT-OVS-1``, the traffic will be sent to the
498 ``L3_FIB_TABLE`` (table=21) in order to reach the floating IP VM on the second hypervisor
499 ``DNAT-OVS-2``. Here, the ``tun_id`` action field will be set as the ``INTERNET VPN VNI`` value.
506 cookie=0x8000003, duration=518.567s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x222e8/0xfffffffe,nw_dst=172.160.0.200 actions=**set_field:0x11178->tun_id**,output:9
511 :emphasize-lines: 1-2, 4
513 cookie=0x9011177, duration=411685.075s, table=36, n_packets=2, n_bytes=196, priority=**6**,**tun_id=0x11178**actions=resubmit(,25)
514 cookie=0x9011179, duration=478573.171s, table=36, n_packets=2, n_bytes=140, priority=5,**tun_id=0x11178**,actions=goto_table:44
516 cookie=0x8000004, duration=408145.805s, table=25, n_packets=600, n_bytes=58064, priority=10,ip,nw_dst=172.160.0.100,**eth_dst=fa:16:3e:e6:e3:c6** actions=set_field:10.0.0.5->ip_dst,write_metadata:0x222e0/0xfffffffe,goto_table:27
517 cookie=0x8000004, duration=408145.805s, table=25, n_packets=600, n_bytes=58064, priority=10,ipactions=goto_table:44
520 First, the ``INTERNAL_TUNNEL_TABLE`` (table=36) will take the packet to the ``PDNAT_TABLE``
521 (table 25) for an exact FIP match in ``PDNAT_TABLE``.
523 - In case of a successful FIP match, ``PDNAT_TABLE`` will further match on floating IP MAC.
524 This is done as a security prerogative since in DNAT usecases, the packet can land to the
525 hypervisor directly from the external world. Hence, better to have a second match criteria.
527 - In case of no match, the packet will be redirected to the SNAT pipeline towards the
528 ``INBOUND_NAPT_TABLE`` (table=44). This is the use-case where ``DNAT-OVS-2`` also acts as
531 In summary, on an given NAPT switch, if both DNAT and SNAT are configured, the incoming traffic
532 will first be sent to the ``PDNAT_TABLE`` and if there is no FIP match found, then it will be
533 forwarded to ``INBOUND_NAPT_TABLE`` for SNAT translation.
535 As part of the response, the ``Internet VPN VNI`` will be used as ``tun_id`` to reach floating
536 IP VM on ``DNAT-OVS-1``.
538 * FIP VM to FIP VM on same hypervisor
540 The pipeline changes will be similar as are for different hypervisors, the only difference being
541 that ``INTERNAL_TUNNEL_TABLE`` will never be hit in this case.
547 * Non-FIP VM to FIP VM on different hypervisors (with NAPT elected as the FIP VM hypervisor)
549 The packet will be sent to the NAPT hypervisor from non-FIP VM (for SNAT translation) using
550 ``Router VNI`` (similar to as described in `SNAT`_ section). As part of the response from the
551 NAPT switch after SNAT reverse translation, the packet is forwarded to non-FIP VM using
552 destination VM's Network VNI.
554 * Non-FIP VM to FIP VM on the same NAPT hypervisor
556 There are no explicit pipeline changes for this use-case.
558 * Non-FIP VM to FIP VM on the same hypervisor, but a different hypervisor elected as NAPT switch
562 The packet will be sent to the NAPT hypervisor from non-FIP VM (for SNAT translation) using
563 ``Router VNI`` (similar to as described in `SNAT`_ section). On the NAPT switch, the
564 ``INTERNAL_TUNNEL_TABLE`` will match on the ``Router VNI`` in the ``tun_id`` field and send the
565 packet to ``OUTBOUND_NAPT_TABLE`` for SNAT translation (similar to as described in `SNAT`_
571 cookie=0x8000005, duration=5073.829s, table=36, n_packets=61, n_bytes=4610, priority=10,ip,**tun_id=0x11170**,actions=write_metadata:0x222e0/0xfffffffe,goto_table:46
573 The packet will later be sent back to the FIP VM hypervisor from L3_FIB_TABLE with ``tun_id``
574 field set as the ``Internet VPN VNI``.
579 cookie=0x8000003, duration=518.567s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x222e8/0xfffffffe,nw_dst=172.160.0.200 actions=**set_field:0x11178->tun_id**,output:9
581 + `FIP VM hypervisor`
583 On reaching the FIP VM Hypervisor, the packet will be sent for DNAT translation. The
584 ``INTERNAL_TUNNEL_TABLE`` will match on the ``Internet VPN VNI`` in the ``tun_id`` field and
585 send the packet to ``PDNAT_TABLE``.
588 :emphasize-lines: 1-2
590 cookie=0x9011177, duration=411685.075s, table=36, n_packets=2, n_bytes=196, priority=**6**,**tun_id=0x11178**,actions=resubmit(,25)
591 cookie=0x8000004, duration=408145.805s, table=25, n_packets=600, n_bytes=58064, priority=10,ip,nw_dst=172.160.0.100,**eth_dst=fa:16:3e:e6:e3:c6** actions=set_field:10.0.0.5->ip_dst,write_metadata:0x222e0/0xfffffffe,goto_table:27
593 Upon FIP VM response, DNAT reverse translation happens and traffic is sent back to the NAPT
594 switch for SNAT translation. The ``L3_FIB_TABLE`` will be set with ``Internet VPN VNI`` in the
600 cookie=0x8000003, duration=95.300s, table=21, n_packets=2, n_bytes=140, priority=42,ip,metadata=0x222ea/0xfffffffe,nw_dst=172.160.0.3 actions=**set_field:0x11178->tun_id**,output:5
604 On NAPT hypervisor, the ``INTERNAL_TUNNEL_TABLE`` will match on the ``Internet VPN VNI`` in
605 the ``tun_id`` field and send the packet to `` INBOUND_NAPT_TABLE`` for SNAT reverse
606 translation (external fixed IP to VM IP). The packet will then be sent back to the non-FIP VM
607 using destination VM's Network VNI.
609 * Non-FIP VM to FIP VM on different hypervisors (with NAPT elected as the non-FIP VM hypervisor)
611 After SNAT Translation, ``Internet VPN VNI`` will be used to reach FIP VM. On FIP VM hypervisor,
612 the ``INTERNAL_TUNNEL_TABLE`` will take the packet to the ``PDNAT_TABLE`` to match on
613 ``Internet VPN VNI`` in the ``tun_id`` field for DNAT translation.
615 Upon response from FIP, DNAT reverse translation happens and uses ``Internet VPN VNI`` to reach
616 back to the non-FIP VM.
622 * ``opendaylight-vni-ranges`` and ``enforce-openstack-semantics`` leaf elements will be added to
623 neutronvpn-config container in ``neutronvpn-config.yang``:
625 + ``opendaylight-vni-ranges`` will be introduced to accept inputs for the VNI range pool from
626 the configurator via the corresponding exposed REST API. In case this is not defined, the
627 default value defined in ``netvirt-neutronvpn-config.xml`` will be used to create this pool.
629 + ``enforce-openstack-semantics`` will be introduced to have the flexibility to enable
630 or disable OpenStack semantics in the dataplane for this feature. It will be defaulted to
631 true, meaning these semantics will be enforced by default. In case it is set to false, the
632 dataplane will continue to be programmed with LPort tags / ELAN tags for switching and with
633 labels for routing use-cases. Once this feature gets stabilized and the semantics are in place
634 to use VNIs on the wire for BGPVPN based forwarding too, this config can be permanently
635 removed if deemed fit.
638 :caption: neutronvpn-config.yang
639 :emphasize-lines: 5-12
641 container neutronvpn-config {
645 leaf opendaylight-vni-ranges {
647 default "70000:99999";
649 leaf enforce-openstack-semantics {
655 * Provider network-type and provider segmentation-ID need to be propagated to FIB Manager to manipulate
656 flows based on the same. Hence:
658 + A new grouping ``network-attributes`` will be introduced in ``neutronvpn.yang`` to hold
659 network type and segmentation ID. This grouping will replace the leaf-node
660 ``network-id`` in ``subnetmaps`` MD-SAL configuration datastore:
663 :caption: neutronvpn.yang
664 :emphasize-lines: 1-27
666 grouping network-attributes {
669 description "UUID representing the network";
679 leaf segmentation-id {
681 description "Optional. Isolated segment on the physical network.
682 If segment-type is vlan, this ID is a vlan identifier.
683 If segment-type is vxlan, this ID is a vni.
684 If segment-type is flat/gre, this ID is set to 0";
688 container subnetmaps {
691 uses network-attributes;
694 + These attributes will be propagated upon addition of a router-interface or addition of a
695 subnet to a BGPVPN to VPN Manager module via the ``subnet-added-to-vpn`` notification
696 modelled in ``neutronvpn.yang``. Hence, the following node will be added:
699 :caption: neutronvpn.yang
702 notification subnet-added-to-vpn {
703 description "new subnet added to vpn";
706 uses network-attributes;
709 + VpnSubnetRouteHandler will act on these notifications and store these attributes in
710 ``subnet-op-data`` MD-SAL operational datastore as described below. FIB Manager will get to
711 retrieve the ``subnetID`` from the primary adjacency of the concerned VPN interface. This
712 ``subnetID`` will be used as the key to retrieve ``network-attributes`` from ``subnet-op-data``
716 :caption: odl-l3vpn.yang
717 :emphasize-lines: 1-10
721 revision-date "2015-06-02";
724 container subnet-op-data {
727 uses nvpn:network-attributes;
730 * ``subnetID`` and ``nat-prefix`` leaf elements will be added to ``prefix-to-interface``
731 container in ``odl-l3vpn.yang``:
733 + For NAT use-cases where the VRF entry is not always associated with a VPN interface (eg. for
734 NAT entries such as floating IP and router-gateway-IPs for external VLAN / flat networks),
735 ``subnetID`` leaf element will be added to make it possible to retrieve the
736 ``network-attributes``.
738 + To distinguish a non-NAT prefix from a NAT prefix, ``nat-prefix`` leaf element will be
739 added. This is a boolean attribute indicating whether the prefix is a NAT prefix (meaning a
740 floating IP, or an external-fixed-ip of a router-gateway). The VRFEntry corresponding to
741 the NAT prefix entries here may carry both the ``MPLS label`` and the ``Internet VPN VNI``.
742 For SNAT-to-DNAT within the datacenter, where the Internet VPN contains an MPLSOverGRE
743 based external network, this VRF entry will publish the ``MPLS label`` to BGP while the
744 ``Internet VPN VNI`` (also known as ``L3VNI``) will be used to carry intra-DC traffic on
745 the external segment within the datacenter.
748 :caption: odl-l3vpn.yang
749 :emphasize-lines: 10-16
751 container prefix-to-interface {
755 leaf vpn-id {type uint32;}
774 * We have to make sure that we do not accept configuration of VxLAN type provider networks without
775 the ``segmentation-ID`` available in them since we are using it to represent the VNI on the wire
776 and in the flows/groups.
778 Clustering considerations
779 -------------------------
780 No specific additional clustering considerations to be adhered to.
783 Other Infra considerations
784 --------------------------
788 Security considerations
789 -----------------------
793 Scale and Performance Impact
794 ----------------------------
817 odl-netvirt-openstack
821 No new changes to the existing REST APIs.
825 No new CLI is being added.
834 Abhinav Gupta <abhinav.gupta@ericsson.com>
835 Vivekanandan Narasimhan <n.vivekanandan@ericsson.com>
838 Chetan Arakere Gowdru <chetan.arakere@altencalsoftlabs.com>
839 Karthikeyan Krishnan <karthikeyan.k@altencalsoftlabs.com>
840 Yugandhar Sarraju <yugandhar.s@altencalsoftlabs.com>
845 Trello card: https://trello.com/c/PfARbEmU/84-enforce-vni-on-the-wire-for-l2-switching-l3-forwarding-and-nating-on-vxlan-overlay-networks
847 #. Code changes to alter the pipeline and e2e testing of the use-cases mentioned.
853 This doesn't add any new dependencies.
861 Appropriate UTs will be added for the new code coming in once framework is in place.
865 There won't be any Integration tests provided for this feature.
869 No new testcases to be added, existing ones should continue to succeed.
873 This will require changes to the Developer Guide.
875 Developer Guide needs to capture how this feature modifies the existing Netvirt L3 forwarding
876 service implementation.
882 * http://docs.opendaylight.org/en/latest/documentation.html
883 * https://wiki.opendaylight.org/view/Genius:Carbon_Release_Plan
884 * `EVPN_RT5 <https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement-03>`_