I have written a few blogs earlier on VxLAN. You can check them in the below links.
Since then VxLAN has changed a lot and I am planning to write a series of articles on VxLAN and EVPN on Nexus devices. For this, I have set up GNS3 Lab with Cisco NxOSv (Nexus 9000v).
You can find more details about the virtual router here:
Lets start digging VxLAN and explore it from the scratch.
What is an Underlay?
In VxLAN we see many new terms which need to be understood to understand the overall working of VxLAN and the Underlay network is one of them. In VxLAN you may consider IGP(interior gateway protocol)+ Multicast network as an underlay. Before setting up VxLAN we need to (i) Provide a way for VTEPs to reach each other. (ii) Identify a method for VTEPs to learn the unknown hosts. For the first point, we use any of the IGP, in this case, I am using OSPF and for the second point we need a method that can help us in flood and learn technique, that means if our node doesn’t know the destination then it should be able to flood the ARP/BUM(Broadcast, Unknown unicast and Multicast) and learn back the mac addresses from the response. For this, if we don’t have a broadcast segment then the only known method is multicast which can help in flooding. The network between the VTEPs is a routed network so it doesn’t allow any kind of flooding. So we resort to multicast protocol like PIM. In this case, I have used PIM SM(sparse-mode).
What is Overlay ?
An overlay network is a kind of network which is built over an already existing well-connected network to achieve some of the specialized characteristics that the underlay network doesn’t provide. For example, in the case of VxLAN the underlying IP routing based network cannot provide layer-2 features like flood and learn, etc so if we need to connect two data centers separated geographically then we don’t have any option but to look for BGP based ISP or MPLS based network, either ways we (i) Loose control of the traffic (ii) IP routing adds extra latency which limits the capability of modern DCs which require low latency networks. Overlay network solves all these problems for us.
How the Connection Initiates?
Suppose we have two devices(VTEPs or Virtual tunnel endpoints) configured for VxLAN, once the IP connectivity and multicast network is set up between them they start to form the overlay network. Below is the packet exchange at the beginning of the connection.
I have kept the network as simple as I can, there are two Nexus9000v routers connected with a direct link and we have OSPF and PIM over that link (Ethernet 1/1). I have enabled a packet capture over this link to show you the kind of packet that is exchanged between the two VxLAN talking nodes.
Lets see what we found in the packet capture.
We already know that once the connectivity is formed in this network then we will have an OSPF packet exchange, I know that you are not on this blog to know the OSPF packet exchange. The major concept here to understand with packet captures is the formation of a multicast tree. So let’s check what kind of PIM packets are exchanged and how both routers are joining the multicast group (220.127.116.11 in this case).
Here are the three important packets that we need to understand about.
STEP-1 : PIM Join sent by NxOS-1 router to RP (rendezvous point) router NxOS-2. This simply means that the router wants to join the multicast group and wants to receive all the traffic sent for this group(18.104.22.168).
STEP-2 The NxOS-2 router also expresses the desire to join the multicast group 22.214.171.124
STEP-3 : All routers send PIM register message to RP in the network informing that they have multicast traffic from the group 126.96.36.199 .
With the above three packets, we ensured that both the routers have shown interest to join the multicast group 188.8.131.52 and the NON-RP router has informed the RP that it has a stream from 184.108.40.206 waiting to be transferred so RP can send any client looking for this group. We have to build a group of routers that want to receive the traffic sent to mcast group 220.127.116.11 now this group will be used to send or receive the flooding traffic like arp like below.
Flood and learn:
Above we have seen the multicast setup that ensures that all VxLAN talking routers join a multicast group and if any router is sending any traffic to that multicast group then that traffic will be seen by all those routers, now how can we use this characteristic in our case? The whole world of Data Plane learning depends on a well-known and tried and tested method of Flood and Learn as we do with ARP or any kind of BUM traffic in the ethernet segment to find out the destination mac. Here we leverage a similar kind of approach but not with broadcast but multicast. The router that wants to send the traffic to an unknown destination will encapsulate the arp packet coming from the source with VxLAN headers (vxlan + udp) and multicast that arp to neighbors who have joined the configured group. In our case, we have defined 18.104.22.168 as that group and that is the reason if you see the above Wireshark capture, you see it being sent to the destination 22.214.171.124.
This multicast packet will be received by all the routers joined the group and then forwarded to the LAN corresponding to the VNI in the VxLAN header. This VNI maps to a VLAN on the router.
The response will be unicast to the router from where the ARP has originated.
Headers in VxLAN packet
You must have already noticed the headers now in the previous Wireshark captures, here is a look of a vxlan encapsulated packet.
UDP header is very important because with the help of this header only the router comes to know that the packet belongs to VxLAN. The VxLAN header has VNI ID details which is necessary to know the destination vlan of the packet.
Now we have seen a good set of basics lets check out the configuration on both the Switches NxOS-1 and NxOS-2 router.
NxOS-1 Switch : ! feature ospf feature pim feature interface-vlan feature vn-segment-vlan-based feature nv overlay ! ip pim rp-address 126.96.36.199 group-list 188.8.131.52/4 ip pim ssm range 184.108.40.206/8 ! vlan 1000 vlan 1000 vn-segment 5000 ! interface nve1 no shutdown source-interface loopback1 member vni 5000 mcast-group 220.127.116.11 ! interface Ethernet1/1 ip address 10.10.10.1/24 ip router ospf 1 area 0.0.0.0 ip pim sparse-mode no shutdown ! interface Ethernet1/3 switchport switchport access vlan 1000 no shutdown ! interface loopback1 ip address 18.104.22.168/32 ip router ospf 1 area 0.0.0.0 ip pim sparse-mode ! router ospf 1 router-id 22.214.171.124 ! NxOS-2 Switch : ! feature ospf feature pim feature interface-vlan feature vn-segment-vlan-based feature nv overlay ! ip pim rp-address 126.96.36.199 group-list 188.8.131.52/4 ip pim ssm range 184.108.40.206/8 ! vlan 1,1000 vlan 1000 vn-segment 5000 ! interface nve1 no shutdown source-interface loopback1 member vni 5000 mcast-group 220.127.116.11 ! interface Ethernet1/1 ip address 10.10.10.2/24 ip router ospf 1 area 0.0.0.0 ip pim sparse-mode no shutdown ! interface Ethernet1/3 switchport switchport access vlan 1000 no shutdown ! interface loopback1 ip address 18.104.22.168/32 ip router ospf 1 area 0.0.0.0 ip pim sparse-mode ! interface loopback2 ip address 22.214.171.124/32 ip router ospf 10 area 0.0.0.0 ip pim sparse-mode ! router ospf 1 router-id 126.96.36.199 !
NxOS-1# show nve vni Codes: CP - Control Plane DP - Data Plane UC - Unconfigured SA - Suppress ARP SU - Suppress Unknown Unicast Xconn - Crossconnect MS-IR - Multisite Ingress Replication Interface VNI Multicast-group State Mode Type [BD/VRF] Flags --------- -------- ----------------- ----- ---- ------------------ ----- nve1 5000 188.8.131.52 Up DP L2  NxOS-2# show nve vni Codes: CP - Control Plane DP - Data Plane UC - Unconfigured SA - Suppress ARP SU - Suppress Unknown Unicast Xconn - Crossconnect MS-IR - Multisite Ingress Replication Interface VNI Multicast-group State Mode Type [BD/VRF] Flags --------- -------- ----------------- ----- ---- ------------------ ----- nve1 5000 184.108.40.206 Up DP L2  NxOS-1# show nve interface Interface: nve1, State: Up, encapsulation: VXLAN VPC Capability: VPC-VIP-Only [not-notified] Local Router MAC: 0c56.3a00.1b08 Host Learning Mode: Data-Plane Source-Interface: loopback1 (primary: 220.127.116.11, secondary: 0.0.0.0) NxOS-2# show nve interface Interface: nve1, State: Up, encapsulation: VXLAN VPC Capability: VPC-VIP-Only [not-notified] Local Router MAC: 0c31.7000.1b08 Host Learning Mode: Data-Plane Source-Interface: loopback1 (primary: 18.104.22.168, secondary: 0.0.0.0)
The other step for the verification is ping from the VPC hosts. Lets check that as well.
PC1> ping 192.168.10.2 84 bytes from 192.168.10.2 icmp_seq=1 ttl=64 time=2.351 ms 84 bytes from 192.168.10.2 icmp_seq=2 ttl=64 time=2.502 ms 84 bytes from 192.168.10.2 icmp_seq=3 ttl=64 time=2.338 ms 84 bytes from 192.168.10.2 icmp_seq=4 ttl=64 time=6.568 ms 84 bytes from 192.168.10.2 icmp_seq=5 ttl=64 time=2.398 ms PC2> ping 192.168.10.1 84 bytes from 192.168.10.1 icmp_seq=1 ttl=64 time=2.176 ms 84 bytes from 192.168.10.1 icmp_seq=2 ttl=64 time=2.254 ms 84 bytes from 192.168.10.1 icmp_seq=3 ttl=64 time=2.417 ms 84 bytes from 192.168.10.1 icmp_seq=4 ttl=64 time=5.774 ms 84 bytes from 192.168.10.1 icmp_seq=5 ttl=64 time=2.547 ms
So we made it, we are able to ping end to end in the same network and it certainly feels like layer2. This was my first blog on the Nexus 9000v device and I was not sure if this will work but good that I completed it. Now I am planning to build on this lab and cover the EVPN part which brings the control plane learning and the flood and learn task is offloaded to BGP in that case. Stay tuned for the exciting journey and thanks for reading.