Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

PIM signaling

Introduction

Network topology

Underlay

Unicast overlay

Multicast overlay

Data-plane

Conclusion

Introduction

Large corporate customers sometimes have a desire to forward multicast traffic not only inside their networks in the large offices and data centers but also between them. Service provides have to handle the tasks of delivering such packets between the clients points of presence.

Among the most popular solutions of multicast traffic forwarding in MPLS networks we can note the following ones: Rosen and mLDP, RSVP and head-end replication, for example, in EVPN; in this article we discuss one of them – Rosen (named after one of RFC authors).

On multicast traffic forwarding between two PE (Provider Edge) routers in Rosen mode a special GRE tunnel in which multicast frames are sent is up. It’s worth noting that underlay network can have either a real multicast support or its emulation (for example, in DMVPN networks, that is on using 2547oDMVPN).

Signaling in the process can be based on different protocols: PIM, mLDP, BGP. Let’s find out what takes place in the network when PIM SM protocol is used for signaling.

Network topology

To simplify the task, let’s review the underlay network without using any additional tunnel protocols, for example, DMVPN.

The following devices are included in the service provider MPLS network: PE1, P1, P2 and PE2. The router purpose is clear from its name. Cisco CSR1000v virtual router with the latest software version installed (IOS-XE of 17.3 series) was selected as a platform for the provider equipment emulation. The client equipment is presented by Cisco routers of 7200 series (IOS of 15.2 version). The only thing that should be additionally mentioned is about devices Sender and Receiver. In the given article we used routers as a sender and receiver of multicast traffic, naturally, in the real network the bare metal servers, workstations, media players will be used for this purpose instead.

Underlay

Under underlay network in the describing case we mean the service provider network infrastructure. In the provider network we used the following protocols: OSPF and LDP.

PE1#sho run | s r o
router ospf 1
 network 0.0.0.0 255.255.255.255 area 0
 mpls ldp autoconfig
PE1#sho mpls ldp neighbor
    Peer LDP Ident: 1.1.1.1:0; Local LDP Ident 3.3.3.3:0
        TCP connection: 1.1.1.1.646 - 3.3.3.3.35398
        State: Oper; Msgs sent/rcvd: 70/72; Downstream
        Up time: 00:53:14
        LDP discovery sources:
          GigabitEthernet2, Src IP addr: 172.16.0.2
        Addresses bound to peer LDP Ident:
          172.16.1.1      172.16.0.2      1.1.1.1
PE1#sho mpl fo
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
16         Pop Label  1.1.1.1/32       0             Gi2        172.16.0.2
17         16         2.2.2.2/32       0             Gi2        172.16.0.2
18         Pop Label  172.16.1.0/24    0             Gi2        172.16.0.2
19         17         172.16.2.0/24    0             Gi2        172.16.0.2
20         19         4.4.4.4/32       0             Gi2        172.16.0.2
21         No Label   192.168.0.0/24[V]   \
                                       0             aggregate/a
22         No Label   5.5.5.5/32[V]    0             Gi1        192.168.0.2
23         No Label   10.0.0.0/24[V]   1596          Gi1        192.168.0.2

There is also the support of multicast traffic in the underlay network. For signaling PIM in sparse mode is used. On some series of Cisco equipment multicast traffic routing is turned off by default, so in addition to PIM protocol activation on all necessary interfaces, one should also turn on multicast packets routing. Router PE1 was selected as Rendezvous Point (RP), information about which is forwarding in the network with the help of a standard protocol of

RP default election - BootStrap Router (BSR).
PE1(config)#ip multicast-routing distributed
PE1(config)#^Z
PE1#sho ip pim interface
Address          Interface                Ver/   Nbr    Query  DR         DR
                                          Mode   Count  Intvl  Prior
3.3.3.3          Loopback0                v2/S   0      30     1          3.3.3.3
172.16.0.1       GigabitEthernet2         v2/S   1      30     1          172.16.0.2
PE1#sho run | i candi
ip pim bsr-candidate Loopback0 0
ip pim rp-candidate Loopback0

Let’s make sure that there is all necessary information on router PE2 for multicast operating in the provider network: PIM neighbors are discovered, information about RP address is received.

PE2#sho ip pim nei
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
      P - Proxy Capable, S - State Refresh Capable, G - GenID Capable,
      L - DR Load-balancing Capable
Neighbor          Interface                Uptime/Expires    Ver   DR
Address                                                            Prio/Mode
172.16.2.1        GigabitEthernet2         00:45:52/00:01:36 v2    1 / S P G
PE2#sho ip pim rp map
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/4
  RP 3.3.3.3 (?), v2
    Info source: 3.3.3.3 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:45:18, expires: 00:01:36

We decided to get a dump on P1-P2 link and find out a message using which router PE1 announces itself as a candidate for BSR and RP roles.

Unicast overlay

For user traffic processing VRF A was created. MP-BGP is responsible for transmitting routing information between PE1 and PE2. Routes exchange with user routers CE1 and CE2 is performed using OSPF.

PE1#sho run | s r b
router bgp 1
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 4.4.4.4 remote-as 1
 neighbor 4.4.4.4 update-source Loopback0
 !
 address-family ipv4
 exit-address-family
 !
 address-family vpnv4
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf A
  redistribute ospf 2 match internal external 2
 exit-address-family
PE1#sho run | s r ospf 2
router ospf 2 vrf A
 capability vrf-lite
 redistribute bgp 1
 network 0.0.0.0 255.255.255.255 area 0

Transmission of MPLS service labels is performed with the help of MP-BGP (address-family vpnv4).

PE1#sho bgp vpnv4 unicast vrf A labels
   Network          Next Hop      In label/Out label
Route Distinguisher: 1:1 (A)
   5.5.5.5/32       192.168.0.2     22/nolabel
   6.6.6.6/32       4.4.4.4         nolabel/22
   10.0.0.0/24      192.168.0.2     23/nolabel
   10.0.1.0/24      4.4.4.4         nolabel/23
   192.168.0.0      0.0.0.0         21/nolabel(A)
   192.168.1.0      4.4.4.4         nolabel/21

It’s not difficult to make sure that user routers obtain all necessary prefixes.

CE1#sho ip ro
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       + - replicated route, % - next hop override
Gateway of last resort is not set
      5.0.0.0/32 is subnetted, 1 subnets
C        5.5.5.5 is directly connected, Loopback0
      6.0.0.0/32 is subnetted, 1 subnets
O E2     6.6.6.6 [110/1] via 192.168.0.1, 00:43:33, GigabitEthernet0/0
      10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C        10.0.0.0/24 is directly connected, GigabitEthernet1/0
L        10.0.0.1/32 is directly connected, GigabitEthernet1/0
O E2     10.0.1.0/24 [110/1] via 192.168.0.1, 00:43:33, GigabitEthernet0/0
      192.168.0.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.0.0/24 is directly connected, GigabitEthernet0/0
L        192.168.0.2/32 is directly connected, GigabitEthernet0/0
O E2  192.168.1.0/24 [110/1] via 192.168.0.1, 00:43:33, GigabitEthernet0/0

As devices Sender and Receiver imitate hosts, no dynamic routing is used on them.

Receiver#sho ip ro
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       + - replicated route, % - next hop override
Gateway of last resort is 10.0.1.1 to network 0.0.0.0
S*    0.0.0.0/0 [1/0] via 10.0.1.1
      10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        10.0.1.0/24 is directly connected, GigabitEthernet1/0
L        10.0.1.2/32 is directly connected, GigabitEthernet1/0

Hosts Sender and Receiver successfully exchange the data.

Receiver#ping 10.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/32/44 ms

We intercepted one of the given ICMP messages on P1-P2 link.

We would like to take our readers attention to the fact that there are two MPLS labels in the packet: service and transport.

Multicast overlay

We decided to put Rendezvous Point functions for VRF A on the client equipment, so router CE1 will announce itself as C-RP and C-BSR.

CE1#sho run | i candi
ip pim bsr-candidate Loopback0 0
ip pim rp-candidate Loopback0
CE1#sho ip pim rp map
PIM Group-to-RP Mappings
This system is a candidate RP (v2)
This system is the Bootstrap Router (v2)
Group(s) 224.0.0.0/4
  RP 5.5.5.5 (?), v2
    Info source: 5.5.5.5 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:55:21, expires: 00:02:06

One should necessarily turn on the support of multicast routing in the service provider network not only in global mode but for VRF A as well.

PE1(config)#ip multicast-routing vrf A distributed

But this is not the end. Each VRF should be configured for using unique multicast groups for forwarding user traffic with the help of mdt default and mdt data commands. The second command is optional.

PE1#sho run vrf | s def
vrf definition a
 rd 1:1
 route-target export 1:1
 route-target import 1:1
 !
 address-family ipv4
  mdt default 239.1.1.1
  mdt data 239.1.2.0 0.0.0.255 threshold 1
  mdt data threshold 1
 exit-address-family

By default all traffic is forwarding using the group specified in mdt default command, however, if traffic transmission speed exceeds a particular threshold set using threshold option, then the given user traffic stream is moved to one of the data groups.

After we performed binding of the default group to VRF, all PE routers join this group by default and try to discover PIM neighbors in the overlay network.

It is a very interesting and significant packet which we captured in the provider network. It is sent by router PE2 to the device that performs RP functions in the underlay network. It is a unicast packet, so MPLS transport label is added to it in a standard way. There is a PIM Register message in this IPv4 packet. That means router PE2 tries to send some multicast packet to RP side while there is no joining from RP side yet. Everything is the same as during the ordinary multicasting. But what is an IP packet which PE2 wants to deliver to PE1? It is no more no less than multicasting for 239.1.1.1 group, which was defined as mdt default for VRF A. Let’s continue the decapsulation: GRE header and IPv4 again. It is router PE2 attempt to discover PIM neighbors inside VRF A (based on the receiver address 224.0.0.13) using PIM Hello message.

As the result we can see that all PIM routers inside VRF A are discovered and neighbourship with them is established.

PE2#sho ip pim vrf a neighbor
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
      P - Proxy Capable, S - State Refresh Capable, G - GenID Capable,
      L - DR Load-balancing Capable
Neighbor          Interface                Uptime/Expires    Ver   DR
Address                                                            Prio/Mode
192.168.1.2       GigabitEthernet1         01:31:05/00:01:42 v2    1 / DR S P G
3.3.3.3           Tunnel1                  01:27:58/00:01:18 v2    1 / S P G

Which groups and senders does router PE2 join now? Router PE2 is interested in receiving traffic of 239.1.1.1 group that is set as mdt default.

PE2#sho ip mroute
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       G - Received BGP C-Mroute, g - Sent BGP C-Mroute,
       N - Received BGP Shared-Tree Prune, n - BGP C-Mroute suppressed,
       Q - Received BGP S-A Route, q - Sent BGP S-A Route,
       V - RD & Vector, v - Vector, p - PIM Joins on route,
       x - VxLAN group, c - PFP-SA cache created entry,
       * - determined by Assert, # - iif-starg configured on rpf intf,
       e - encap-helper tunnel flag
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode
(*, 239.1.1.1), 01:09:41/stopped, RP 3.3.3.3, flags: SJCFZ
  Incoming interface: GigabitEthernet2, RPF nbr 172.16.2.1
  Outgoing interface list:
    MVRF a, Forward/Sparse, 01:09:41/00:02:18
(3.3.3.3, 239.1.1.1), 01:09:41/00:01:13, flags: JTZ
  Incoming interface: GigabitEthernet2, RPF nbr 172.16.2.1
  Outgoing interface list:
    MVRF a, Forward/Sparse, 01:09:41/00:02:18
(4.4.4.4, 239.1.1.1), 01:09:41/00:03:13, flags: FT
  Incoming interface: Loopback0, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet2, Forward/Sparse, 01:09:41/00:02:38
(*, 224.0.1.40), 01:26:41/00:02:22, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Loopback0, Forward/Sparse, 01:26:39/00:02:22

At this moment the network is ready for forwarding user multicast traffic.

Let’s join 239.7.7.7 group with the help of IGMP from device Receiver.

Receiver#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
Receiver(config)#int gi1/0
Receiver(config-if)#ip igmp jo 239.7.7.7

In the picture below one can see the corresponding IGMP (Membership Report) message sent from router Receiver. End hosts don’t know how multicast support in the service provider network is configured and use standard IGMP Membership Report messages.

Let’s make sure that end host joining is successful from router CE2.

CE2#sho ip igmp groups
IGMP Connected Group Membership
Group Address    Interface                Uptime    Expires   Last Reporter   Group Accounted
239.7.7.7        GigabitEthernet1/0       00:03:49  00:02:37  10.0.1.2
224.0.1.40       GigabitEthernet0/0       01:43:29  00:02:55  192.168.1.1
224.0.1.40       Loopback0                01:43:36  00:02:27  6.6.6.6

Let’s make sure that all user joins were duplicated by router CE2 towards the provider equipment using PIM.

PE2#sho ip mroute vrf a
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       G - Received BGP C-Mroute, g - Sent BGP C-Mroute,
       N - Received BGP Shared-Tree Prune, n - BGP C-Mroute suppressed,
       Q - Received BGP S-A Route, q - Sent BGP S-A Route,
       V - RD & Vector, v - Vector, p - PIM Joins on route,
       x - VxLAN group, c - PFP-SA cache created entry,
       * - determined by Assert, # - iif-starg configured on rpf intf,
       e - encap-helper tunnel flag
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode
(*, 239.7.7.7), 00:05:12/00:03:12, RP 5.5.5.5, flags: S
  Incoming interface: Tunnel1, RPF nbr 3.3.3.3
  Outgoing interface list:
    GigabitEthernet1, Forward/Sparse, 00:05:12/00:03:12
(*, 224.0.1.40), 01:45:32/00:02:31, RP 0.0.0.0, flags: DPL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list: Null

Despite the fact that CE2-PE2 is a link to the service provider side, no provider technologies are used here, only a standard PIM.

Let’s make sure that router CE1 performing Rendezvous Point functions for the client network also received information about the groups to which joining is performed.

CE1#sho ip mroute
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode
(*, 239.7.7.7), 00:10:46/00:02:32, RP 5.5.5.5, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 00:02:21/00:02:32
(*, 224.0.1.40), 01:50:51/00:02:48, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet0/0, Forward/Sparse, 01:50:49/00:02:48

And now we have a look at the dump on the link between P1 and P2 again and find a message with the help of which the joining is performed.

Thus, we can see a standard PIM Join/Prune message sent by router PE2 in IP packet with sender address 4.4.4.4 and group receiver address 224.0.0.13. However, this message is in the overlay network, so it is encapsulated into GRE and then to IPv4 packet forwarding in the underlay network. Receiver address in the external IP packet is 239.1.1.1, that corresponds to mdt default group for VRF A. That’s exactly the receiver address based on which the backbone network identifies multicasting streams from different VRF.

Inside PIM message itself we can see two IP addresses as well: 3.3.3.3 and 5.5.5.5. Address 3.3.3.3 identifies router PE1 that is a PIM neighbor for PE2. Address 5.5.5.5 indicates the end host that is RP address. Joining the groups using PIM is taking place sequentially from the receiver towards Rendezvous Point (in case of absence of a host sending traffic for this group): CE2 joins PE2, PE2 joins PE1, PE1 joins CE1. That means, for example, that on CE2-PE2 link 4.4.4.4 address is specified as Upstream-neighbor, whereas RP address (5.5.5.5) doesn’t change.

Data-plane

The only thing to do is to make sure that multicasting between Sender and Receiver is successful.

Sender#ping 239.7.7.7 repeat 5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 239.7.7.7, timeout is 2 seconds:
Reply to request 0 from 10.0.1.2, 164 ms
Reply to request 1 from 10.0.1.2, 52 ms
Reply to request 2 from 10.0.1.2, 44 ms
Reply to request 3 from 10.0.1.2, 56 ms
Reply to request 4 from 10.0.1.2, 48 ms

In the service provider network Echo Request ICMP messages sending to a group address look like shown below.

ICMP Echo Reply messages are sent as unicast, so all standard rules of data transmission inside VRF are applied to these packets.

If multicast streaming rate exceeds the given value, then switching from mdt default group to one of mdt data groups is performed in the underlay network. At this moment PIM Join message is forwarding in underlay network that means joining mdt data group from PE2 device towards PE1 side.

Conclusion

Multicasting can be successfully performed in service providers’ MPLS network inside VRF. Signaling can be performed with the help of different protocols: in the given article we restricted ourselves to describing only PIM signaling, however, we are planning to describe signaling with the help of BGP soon.

The solution based on Rosen GRE is rather popular, not least of all due to practically ubiquitous support by the network equipment.

The configuration of all routers is given in the archive.

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

DMVPN is a well-known solution for hub&spoke connectivity. Sometimes a need might arise to bring some notion of multi-tenancy to DMVPN network. Although it’s possible to assign a DMVPN tunnel per VRF, such an approach is not particularly scalable from operational perspective or management. MPLS is much more suitable in this case since it has necessary capabilities as well as proven reputation from Enterprise and Service Provider world.

GRE is capable of encapsulating multiple types of payload, including MPLS labels, so there is no challenge for data plane. The control plane, however, might be a bit trickier. Both LDP and RSVP require neighbor relationships established prior to any data exchange. Scalability is achieved by using multicast messages to find peer routers and exchange some protocol-related parameters. Using unicast neighbourship in DMVPN for LDP/RSVP would at least hinder the scalability of the solution so we will not look further into such a scenario. Multicast, however, poses a different set of challenges as spokes can exchange multicast messages only with hub thus rendering spoke-to-spoke MPLS communication impossible.

There is a third solution that is both scalable enough and capable of disseminating useful data between spokes – BGP labeled unicast. There are a few configuration options that allow spokes become PEs (e.g. option 1 – VPN labeled packet sent directly to spoke, DMVPN Phase 2; option 2 – hub involved into VRF forwarding and DMVPN Phase 3 redirection); however, it might be worth having flexibility to put PE role on a router behind a spoke or a hub, thus making them P-routers.

It’s time to build a lab and make our way up to the solution. Here is the topology we are going to look at during 2547oDMVPN discussion:

Router roles are listed below:

  • R1, R5, R7 (and R6 later) – MPLS PE;
  • R3 – ISP for DMVPN;
  • R2 – DMVPN hub;
  • R4, R6 – DMVPN spokes.

Routing protocols:

  • R1-R2, R4-R5, R6-R7 – OSPF, local office IGP;
  • R3 – ISP OSPF, connectivity for DMVPN routers;
  • R1, R5, R7 (R6) – MP-BGP VPNv4 AF;
  • R2, R4, R6 – MP-BGP IPv4 AF;

Loopback0 serves the purpose of unique router identification, e.g. OSPF RID. Loopback1 is used as BGP LU source (exact reason is discussed later). Loopback2 emulates client networks in VRFs.

Now that we’ve dealt with an overview of the topology, let’s create some initial configuration, starting from PEs:

R1(config)#vrf definition A
R1(config-vrf)#rd 1:1
R1(config-vrf)#address-family ipv4
R1(config-vrf-af)#route-target export 1:1
R1(config-vrf-af)#route-target import 1:1
R1(config-if)#interface Loopback0
R1(config-if)#ip address 1.1.1.1 255.255.255.255
R1(config)#interface Loopback2
R1(config-if)#vrf forwarding A
R1(config-if)#ip address 1.1.1.1 255.255.255.255
R1(config)#interface FastEthernet0/0
R1(config-if)#ip address 192.168.12.1 255.255.255.0
R1(config-if)#mpls ldp router-id Loopback0
R1(config)#router ospf 1
R1(config-router)#mpls ldp autoconfig
R1(config-router)#router-id 1.1.1.1
R1(config-router)#network 0.0.0.0 255.255.255.255 area 0
R1(config)#router bgp 1
R1(config-router)#template peer-policy L3VPN
R1(config-router-ptmp)#send-community both
R1(config-router-ptmp)#exit-peer-policy
R1(config-router)#template peer-session SESSION
R1(config-router-stmp)#remote-as 1
R1(config-router-stmp)#update-source Loopback0
R1(config-router-stmp)#exit-peer-session
R1(config-router)#bgp router-id 1.1.1.1
R1(config-router)#no bgp default ipv4-unicast
R1(config-router)#neighbor 5.5.5.5 inherit peer-session SESSION
R1(config-router)#neighbor 7.7.7.7 inherit peer-session SESSION
R1(config-router)#address-family vpnv4
R1(config-router-af)#neighbor 5.5.5.5 activate
R1(config-router-af)#neighbor 5.5.5.5 send-community extended
R1(config-router-af)#neighbor 5.5.5.5 inherit peer-policy L3VPN
R1(config-router-af)#neighbor 7.7.7.7 activate
R1(config-router-af)#neighbor 7.7.7.7 send-community extended
R1(config-router-af)#neighbor 7.7.7.7 inherit peer-policy L3VPN
R1(config-router-af)#exit-address-family
R1(config-router)#address-family ipv4 vrf A
R1(config-router-af)#redistribute connected
R1(config-router-af)#exit-address-family
R5(config)#vrf definition A
R5(config-vrf)# rd 1:1
R5(config-vrf)# address-family ipv4
R5(config-vrf-af)#  route-target export 1:1
R5(config-vrf-af)#  route-target import 1:1
R5(config-vrf-af)# exit-address-family
R5(config-vrf)#interface Loopback0
R5(config-if)# ip address 5.5.5.5 255.255.255.255
R5(config-if)#interface Loopback2
R5(config-if)# vrf forwarding A
R5(config-if)# ip address 5.5.5.5 255.255.255.255
R5(config-if)#interface FastEthernet0/0
R5(config-if)# ip address 192.168.45.5 255.255.255.0
R5(config-if)#mpls ldp router-id Loopback0
R5(config)#router ospf 1
R5(config-router)# mpls ldp autoconfig
R5(config-router)# router-id 5.5.5.5
R5(config-router)# network 0.0.0.0 255.255.255.255 area 0
R5(config-router)#router bgp 1
R5(config-router)# template peer-policy L3VPN
R5(config-router-ptmp)#  send-community both
R5(config-router-ptmp)# exit-peer-policy
R5(config-router)# template peer-session SESSION
R5(config-router-stmp)#  remote-as 1
R5(config-router-stmp)#  update-source Loopback0
R5(config-router-stmp)# exit-peer-session
R5(config-router)# bgp router-id 5.5.5.5
R5(config-router)# no bgp default ipv4-unicast
R5(config-router)# neighbor 1.1.1.1 inherit peer-session SESSION
R5(config-router)# neighbor 7.7.7.7 inherit peer-session SESSION
R5(config-router)# address-family vpnv4
R5(config-router-af)#  neighbor 1.1.1.1 activate
R5(config-router-af)#  neighbor 1.1.1.1 send-community extended
R5(config-router-af)#  neighbor 1.1.1.1 inherit peer-policy L3VPN
R5(config-router-af)#  neighbor 7.7.7.7 activate
R5(config-router-af)#  neighbor 7.7.7.7 send-community extended
R5(config-router-af)#  neighbor 7.7.7.7 inherit peer-policy L3VPN
R5(config-router-af)# exit-address-family
R5(config-router)# address-family ipv4 vrf A
R5(config-router-af)#  redistribute connected R5(config-router-af)# exit-address-family
R7(config)#vrf definition A
R7(config-vrf)# rd 1:1
R7(config-vrf)# address-family ipv4
R7(config-vrf-af)#  route-target export 1:1
R7(config-vrf-af)#  route-target import 1:1
R7(config-vrf-af)# exit-address-family
R7(config-vrf)#interface Loopback0
R7(config-if)# ip address 7.7.7.7 255.255.255.255
R7(config-if)#interface Loopback2
R7(config-if)# vrf forwarding A
R7(config-if)# ip address 7.7.7.7 255.255.255.255
R7(config-if)#interface FastEthernet0/0
R7(config-if)# ip address 192.168.67.7 255.255.255.0
R7(config-if)#mpls ldp router-id Loopback0
R7(config)#router ospf 1
R7(config-router)# mpls ldp autoconfig
R7(config-router)# router-id 7.7.7.7
R7(config-router)# network 0.0.0.0 255.255.255.255 area 0
R7(config-router)#router bgp 1
R7(config-router)# template peer-policy L3VPN
R7(config-router-ptmp)#  send-community both
R7(config-router-ptmp)# exit-peer-policy
R7(config-router)# template peer-session SESSION
R7(config-router-stmp)#  remote-as 1
R7(config-router-stmp)#  update-source Loopback0
R7(config-router-stmp)# exit-peer-session
R7(config-router)# bgp router-id 7.7.7.7
R7(config-router)# bgp log-neighbor-changes
R7(config-router)# no bgp default ipv4-unicast
R7(config-router)# neighbor 1.1.1.1 inherit peer-session SESSION
R7(config-router)# neighbor 5.5.5.5 inherit peer-session SESSION
R7(config-router)# address-family ipv4
R7(config-router-af)# exit-address-family
R7(config-router)# address-family vpnv4
R7(config-router-af)#  neighbor 1.1.1.1 activate
R7(config-router-af)#  neighbor 1.1.1.1 send-community extended
R7(config-router-af)#  neighbor 1.1.1.1 inherit peer-policy L3VPN
R7(config-router-af)#  neighbor 5.5.5.5 activate
R7(config-router-af)#  neighbor 5.5.5.5 send-community extended
R7(config-router-af)#  neighbor 5.5.5.5 inherit peer-policy L3VPN
R7(config-router-af)# exit-address-family
R7(config-router)# address-family ipv4 vrf A
R7(config-router-af)#  redistribute connected R7(config-router-af)# exit-address-family

The next step would be connecting DMVPN routers to the corresponding local segments:

R2(config)#interface Loopback0
R2(config-if)# ip address 2.2.2.2 255.255.255.255
R2(config-if)#interface FastEthernet0/0
R2(config-if)# ip address 192.168.12.2 255.255.255.0
R2(config-if)#mpls ldp router-id Loopback0
R2(config)#router ospf 1
R2(config-router)# mpls ldp autoconfig
R2(config-router)# router-id 2.2.2.2
R2(config-router)# redistribute bgp 1 subnets
R2(config-router)# passive-interface default
R2(config-router)# no passive-interface FastEthernet0/0
R2(config-router)# no passive-interface Loopback0
R2(config-router)# network 0.0.0.0 255.255.255.255 area 0
R4(config)#interface Loopback0
R4(config-if)# ip address 4.4.4.4 255.255.255.255
R4(config-if)#interface FastEthernet0/0
R4(config-if)# ip address 192.168.45.4 255.255.255.0
R4(config-if)#mpls ldp router-id Loopback0
R4(config)#router ospf 1
R4(config-router)# mpls ldp autoconfig
R4(config-router)# router-id 4.4.4.4
R4(config-router)# redistribute bgp 1 subnets
R4(config-router)# passive-interface default
R4(config-router)# no passive-interface FastEthernet0/0
R4(config-router)# no passive-interface Loopback0
R4(config-router)# network 0.0.0.0 255.255.255.255 area 0
R6(config)#interface Loopback0
R6(config-if)# ip address 6.6.6.6 255.255.255.255
R6(config-if)#interface FastEthernet0/0
R6(config-if)# ip address 192.168.67.6 255.255.255.0
R6(config-if)#mpls ldp router-id Loopback0
R6(config)#router ospf 1
R6(config-router)# mpls ldp autoconfig
R6(config-router)# redistribute bgp 1 subnets
R6(config-router)# passive-interface default
R6(config-router)# no passive-interface FastEthernet0/0
R6(config-router)# no passive-interface Loopback0
R6(config-router)# network 0.0.0.0 255.255.255.255 area 0

Now to the last piece of preparation for BGP LU discussion – DMVPN configuration. Front-door VRF approach is used in this example to keep RIB clean.

R3(config)#interface Loopback0
R3(config-if)# ip address 3.3.3.3 255.255.255.255
R3(config-if)#interface FastEthernet0/1
R3(config-if)# ip address 192.168.34.3 255.255.255.0
R3(config-if)#interface FastEthernet1/0
R3(config-if)# ip address 192.168.23.3 255.255.255.0
R3(config-if)#interface FastEthernet1/1
R3(config-if)# ip address 192.168.36.3 255.255.255.0
R3(config-if)#router ospf 1
R3(config-router)# router-id 3.3.3.3
R3(config-router)# network 0.0.0.0 255.255.255.255 area 0
R2(config)#vrf definition FVRF
R2(config-vrf)# rd 1:1
R2(config-vrf)# address-family ipv4
R2(config-vrf-af)# exit-address-family
R2(config-vrf)#interface Tunnel0
R2(config-if)# ip address 192.168.0.2 255.255.255.0
R2(config-if)# no ip redirects
R2(config-if)# ip nhrp map multicast dynamic
R2(config-if)# ip nhrp network-id 1
R2(config-if)# ip nhrp redirect
R2(config-if)# tunnel source FastEthernet1/0
R2(config-if)# tunnel mode gre multipoint
R2(config-if)# tunnel vrf FVRF
R2(config-if)#interface FastEthernet1/0
R2(config-if)# vrf forwarding FVRF
R2(config-if)# ip address 192.168.23.2 255.255.255.0
R2(config-if)#router ospf 2 vrf FVRF
R2(config-router)# router-id 192.168.23.2
R2(config-router)# network 0.0.0.0 255.255.255.255 area 0
R4(config)#vrf definition FVRF
R4(config-vrf)# rd 1:1
R4(config-vrf)# address-family ipv4
R4(config-vrf-af)# exit-address-family
R4(config-vrf)#interface Tunnel0
R4(config-if)# ip address 192.168.0.4 255.255.255.0
R4(config-if)# no ip redirects
R4(config-if)# ip nhrp network-id 1
R4(config-if)# ip nhrp nhs 192.168.0.2 nbma 192.168.23.2 multicast
R4(config-if)# ip nhrp shortcut
R4(config-if)# tunnel source FastEthernet0/1
R4(config-if)# tunnel mode gre multipoint
R4(config-if)# tunnel vrf FVRF
R4(config-if)#interface FastEthernet0/1
R4(config-if)# vrf forwarding FVRF
R4(config-if)# ip address 192.168.34.4 255.255.255.0
R4(config-if)#router ospf 2 vrf FVRF
R4(config-router)# router-id 192.168.34.4
R4(config-router)# network 0.0.0.0 255.255.255.255 area 0
R6(config)#vrf definition FVRF
R6(config-vrf)# rd 1:1
R6(config-vrf)# address-family ipv4
R6(config-vrf-af)# exit-address-family
R6(config-vrf)#interface Tunnel0
R6(config-if)# ip address 192.168.0.6 255.255.255.0
R6(config-if)# no ip redirects
R6(config-if)# ip nhrp network-id 1
R6(config-if)# ip nhrp nhs 192.168.0.2 nbma 192.168.23.2 multicast
R6(config-if)# ip nhrp shortcut
R6(config-if)# tunnel source FastEthernet1/1
R6(config-if)# tunnel mode gre multipoint
R6(config-if)# tunnel vrf FVRF
R6(config-if)#interface FastEthernet1/1
R6(config-if)# vrf forwarding FVRF
R6(config-if)# ip address 192.168.36.6 255.255.255.0
R6(config-if)#router ospf 2 vrf FVRF
R6(config-router)# network 0.0.0.0 255.255.255.255 area 0

Now it’s time to discuss some business. In order to make L3VPN work end-to-end, LSP has to be in place. As we’ve discussed above, neither LDP nor RSVP are suited for the task so we are going to use MP-BGP that should carry out the following:

  • distribute IP prefixes of loopbacks on PEs;
  • distribute corresponding labels.

Seems to be easy enough. Let’s take a look at MPLS-enabled interfaces on R2:

R2#sho mpls interfaces 
Interface              IP            Tunnel   BGP Static Operational
FastEthernet0/0        Yes (ldp)     No       No  No     Yes  

Note there is no Tunnel0 interface listed. We aim at enabling just MPLS forwarding without starting LDP, so “mpls ip” command does not satisfy our requirements. There is a little less known command the does exactly what we need:

R2(config)#interface Tunnel0
R2(config-if)#mpls bgp forwarding
R2#sho mpls interfaces
Interface              IP            Tunnel   BGP Static Operational
FastEthernet0/0        Yes (ldp)     No       No  No     Yes        
Tunnel0                No            No       Yes No     Yes  

After replicating the same command on spokes, we can get down to BGP configuration. For this discussion we would use iBGP between hub and spokes; hub would act as a route-reflector as well as listen for inbound BGP connections:

R2(config)#router bgp 1
R2(config-router)# bgp router-id 2.2.2.2
R2(config-router)# bgp listen range 192.168.0.0/24 peer-group DMVPN
R2(config-router)# no bgp default ipv4-unicast
R2(config-router)# neighbor DMVPN peer-group
R2(config-router)# neighbor DMVPN remote-as 1
R2(config-router)# neighbor DMVPN update-source Tunnel0
R2(config-router)# address-family ipv4
R2(config-router-af)#  network 1.1.1.1 mask 255.255.255.255
R2(config-router-af)#  neighbor DMVPN activate
R2(config-router-af)#  neighbor DMVPN route-reflector-client
R2(config-router-af)#  neighbor DMVPN send-label
R4(config)#router bgp 1
R4(config-router)# bgp router-id 4.4.4.4
R4(config-router)# no bgp default ipv4-unicast
R4(config-router)# neighbor 192.168.0.2 remote-as 1
R4(config-router)# neighbor 192.168.0.2 update-source Tunnel0
R4(config-router)# address-family ipv4
R4(config-router-af)#  network 5.5.5.5 mask 255.255.255.255
R4(config-router-af)#  neighbor 192.168.0.2 activate
R4(config-router-af)#  neighbor 192.168.0.2 send-label
R6(config)#router bgp 1
R6(config-router)# bgp router-id 6.6.6.6
R6(config-router)# no bgp default ipv4-unicast
R6(config-router)# neighbor 192.168.0.2 remote-as 1
R6(config-router)# neighbor 192.168.0.2 update-source Tunnel0
R6(config-router)# address-family ipv4
R6(config-router-af)#  network 7.7.7.7 mask 255.255.255.255
R6(config-router-af)#  neighbor 192.168.0.2 activate
R6(config-router-af)#  neighbor 192.168.0.2 send-label

Let’s check whether this configuration works on IP level:

R5#ping 1.1.1.1 so lo 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 5.5.5.5 !!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 48/57/64 ms

However, is it enough to have end-to-end connectivity for VRF?

R5#ping vrf A 1.1.1.1 so lo 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 5.5.5.5
.....
Success rate is 0 percent (0/5)

Unfortunately, there is a little bit more to it. The reason for this particular failure is the absence of a functional LSP:

R5#ping mpls ipv4 1.1.1.1/32 source 5.5.5.5
Sending 5, 100-byte MPLS Echos to 1.1.1.1/32,      timeout is 2 seconds, send interval is 0 msec:
Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
  'L' - labeled output interface, 'B' - unlabeled output interface,
  'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
  'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry,
  'P' - no rx intf label prot, 'p' - premature termination of LSP,
  'R' - transit router, 'I' - unknown upstream index,   'X' - unknown return code, 'x' - return code 0
Type escape sequence to abort.
BBBBB Success rate is 0 percent (0/5)

Specifically, the failure point is again R4:

R4#sho mpls forwarding-table 1.1.1.1 32
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop    
Label      Label      or Tunnel Id     Switched      interface              
19         No Label   1.1.1.1/32       1125          Tu0        192.168.0.2

Is it really though? Shouldn’t R2 have sent R4 the prefix with a label?

R4#sho ip bgp labels 
   Network          Next Hop      In label/Out label
   1.1.1.1/32       192.168.12.1    nolabel/nolabel
   2.2.2.2/32       192.168.0.2       nolabel/imp-null    <output omitted>

Here is an important observation. 1.1.1.1/32 is sent unlabeled while 2.2.2.2/32 is perfectly fine. Also note an “inconsistent” next-hop for 1.1.1.1/32 – it’s R1 address instead of one of R2. R2 imported a prefix while preserving the original next-hop from the RIB. Since the R2-R4 session is iBGP, nexthop is kept as is. 2.2.2.2/32, on the other hand, has R2 as a next-hop. This subtle difference is a key for R2 behaviour: BGP assigns a label to a prefix only if BGP speaker is a next-hop itself for that prefix.

Given some thought, it makes sense though: R2 has to assign its own label to an update if it is within LSP. If next-hop is not R2, however, injecting a label cannot be guaranteed to be valid as a packet may pass along another way, not including R2.

In this case we would like to assign labels to prefixes coming from route import to BGP but there should be no change to reflected packets:

R2(config)#router bgp 1
R2(config-router)#address-family ipv4
R2(config-router-af)#neighbor PEER next-hop-self ?    all  Enable next-hop-self for both eBGP and iBGP received paths   <cr>
R2(config-router-af)#neighbor PEER next-hop-self

The defaults are tricky here. The command we’ve issued makes R2 change next-hop for local  and eBGP prefixes but not for iBGP ones; the keyword “all” would allow such a change for iBGP prefixes too.

Does it solve the issue?

R5#ping mpls ipv4 1.1.1.1/32 source 5.5.5.5
<output omitted>
Type escape sequence to abort.
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 56/65/72 ms
R5#
R5#ping vrf A 1.1.1.1 so lo 0              Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 5.5.5.5
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 52/80/108 ms
R5#
R5#ping mpls ipv4 7.7.7.7/32 source 5.5.5.5
<output omitted>
Type escape sequence to abort.
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 56/60/64 ms
R5#
R5#ping vrf A 7.7.7.7 so lo 2              Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 7.7.7.7, timeout is 2 seconds:
Packet sent with a source address of 5.5.5.5
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 48/79/104 ms
R5#
R5#traceroute 7.7.7.7 so lo 0 Type escape sequence to abort.
Tracing the route to 7.7.7.7
VRF info: (vrf in name/id, vrf out name/id)
1    192.168.45.4 [MPLS: Label 22 Exp 0] 88 msec 80 msec 80 msec
2    192.168.0.6 [MPLS: Label 16 Exp 0] 52 msec 64 msec 52 msec
3    192.168.67.7 40 msec 64 msec 64 msec

We’ve achieved L3VPN across DMVPN cloud. However, if you followed the configuration on your own so far, you might have noticed a warning that IOS issued when we configured BGP LU:

R4(config-router-af)#neighbor 192.168.0.2 send-label
%BGP: For distributing MPLS labels to IBGP peers, the update source should be set to a loopback interface

Quite an interesting message we’ve got in a log. Although it seems alerting, the communication between PEs still works. What’s the issue here then? Consider the topology below:

For the sake of this example, R1 establishes BGP session from its f0/0 interface to R3 loopback, mimicking our DMVPN deployment. Although control plane would work perfectly fine, the data plane, specifically LSP, would be broken and the reason for that is PHP, penultimate hop popping. R1 source interface would reside in connected subnet for R2, so the latter would announce implicit-null for the subnet towards R3. Then the following events should occur:

  • R3 prepends VPN label (some value, e.g. 16) to a packet;
  • R3 prepends transport label (implicit-null) to the packet;
  • R3 forwards the MPLS frame towards R2 with VPN label being on top of the stack.

There are two possible outcomes for R2: it might have no idea what to do with the frame (no VPN label match in LFIB) or it would steer the frame in a wrong direction (accidental match between VPN label and LFIB). Either way, R1 has no chance of receiving the frame thus rendering LSP broken. Note that explicit-null would not remedy the situation as it is not used for forwarding.

What does it have to do with our case anyway? The issue above is related to PEs being incorrectly configured. However, if we introduced PE role to either of DMVPN routers, we would face the same challenge since BGP is not configured on a loopback as a source interface. One might suggest migrating BGP configuration entirely to loopbacks; however, it would bring LSPs down since there would be nobody left to announce labels for the loopbacks themselves – usually it’s LDP/RSVP job. There is an easier approach though: use loopbacks only for VPNv4 AF sessions as well as announcing them via BGP LU. Let’s check it out:

R6(config)#vrf definition A
R6(config-vrf)# rd 2:2
R6(config-vrf)# address-family ipv4
R6(config-vrf-af)#  route-target export 1:1
R6(config-vrf-af)#  route-target import 1:1
R6(config-vrf-af)# exit-address-family
R6(config-vrf)#interface Loopback1
R6(config-if)# ip address 100.0.0.6 255.255.255.255
R6(config-if)#interface Loopback2
R6(config-if)# vrf forwarding A
R6(config-if)# ip address 6.6.6.6 255.255.255.255
R6(config-if)#router bgp 1
R6(config-router)# template peer-policy L3VPN
R6(config-router-ptmp)#  send-community both
R6(config-router-ptmp)# exit-peer-policy
R6(config-router)# template peer-session SESSION
R6(config-router-stmp)#  remote-as 1
R6(config-router-stmp)#  update-source Loopback1
R6(config-router-stmp)# exit-peer-session
R6(config-router)# neighbor 1.1.1.1 inherit peer-session SESSION
R6(config-router)# neighbor 5.5.5.5 inherit peer-session SESSION
R6(config-router)# neighbor 7.7.7.7 inherit peer-session SESSION
R6(config-router)# address-family ipv4
R6(config-router-af)#  network 100.0.0.6 mask 255.255.255.255
R6(config-router-af)# exit-address-family
R6(config-router)# address-family vpnv4
R6(config-router-af)#  neighbor 1.1.1.1 activate
R6(config-router-af)#  neighbor 1.1.1.1 send-community extended
R6(config-router-af)#  neighbor 1.1.1.1 inherit peer-policy L3VPN
R6(config-router-af)#  neighbor 5.5.5.5 activate
R6(config-router-af)#  neighbor 5.5.5.5 send-community extended
R6(config-router-af)#  neighbor 5.5.5.5 inherit peer-policy L3VPN
R6(config-router-af)#  neighbor 7.7.7.7 activate
R6(config-router-af)#  neighbor 7.7.7.7 send-community extended
R6(config-router-af)#  neighbor 7.7.7.7 inherit peer-policy L3VPN
R6(config-router-af)# exit-address-family
R6(config-router)# address-family ipv4 vrf A
R6(config-router-af)#  redistribute connected R6(config-router-af)# exit-address-family

Now R6 is a part of L3VPN overlay, it’s time to see if reachability is in place:

R6#tclsh                                                      
R6(tcl)#foreach x {1.1.1.1 5.5.5.5 7.7.7.7} {ping vrf A $x so lo 2} Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 6.6.6.6 !!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 28/59/84 ms Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 5.5.5.5, timeout is 2 seconds:
Packet sent with a source address of 6.6.6.6 !!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/32/44 ms Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 7.7.7.7, timeout is 2 seconds:
Packet sent with a source address of 6.6.6.6 !!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/9/16 ms

Now the connectivity is established as expected. In my opinion, the caveats of such a configuration are the following:

  1. mere existence of BGP IPv4 labeled unicast;
  2. BGP label assignment only in case the speaker is next-hop for the prefix;
  3. PHP being able to break LSP.

In this article we’ve discussed MPLS L3VPN over DMVPN (2547oDMVPN) leveraging iBGP LU to distribute MPLS labels for the underlay. An important distinction from the documented solutions is the ability to put PEs behind the spokes or hubs while maintaining direct spoke-to-spoke communication within DMVPN. As for eBGP configuration, I would leave it as an exercise for a curious reader.

Initial version of this article can be found in the Iaroslav blog.

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

OSPF is a link-state IGP that resolves loops by creating a forwarding tree based on Djikstra algorithm within a single area. The behavior between the areas, however, resembles distance-vector IGPs that just send a prefix along with a cost; that’s the reason why OSPF is sometimes referred to as hybrid IGP (LS and DV combined). The loop prevention mechanism is quite simple, though: all areas connect to area 0, backbone (A0), that is responsible for passing prefixes to other areas; there is no direct route exchange between non-backbone areas.

There is a well-known YAK in the wild called OSPF Virtual Link. Everyone knows it’s a bad design, everyone agrees it should not be used unless completely necessary; yet there is little if any information why exactly that is not a good idea except introducing extra complexity. We are engineers not afraid of challenges so let’s look for a better argument.

As always, there is nothing better than a lab and solid access to google.com. I used GNS3 with Cisco 7200 images to emulate the whole topology that can be found below:

 

Each of the routers has a loopback0 for OSPF RID and other infrastructure purposes in a corresponding area; ABRs allocate loopbacks to A0. The addressing follows the scheme 192.168.xy.x|y/24 for Rx and Ry (e.g. 192.168.12.1 on R1 f0/1 interface). Besides a common OSPF, there is also a virtual link (VL) between R1 and R3, just for R1 to feel cozy in a warm backbone area.

For those of you who are rightfully keen on testing connectivity, please, feel free to use TCL and check pings everywhere; I’m going to focus on R1-R5 connectivity. That being said, let’s check it:

R1#ping 5.5.5.5 so lo 0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 5.5.5.5, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/30/40 ms
R1#traceroute 5.5.5.5 so lo 0 numeric
Type escape sequence to abort.
Tracing the route to 5.5.5.5
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.14.4 16 msec 24 msec 20 msec
  2 192.168.45.5 48 msec 16 msec 24 msec

So far so good, the connectivity is there and traffic follows a correct path even in the presence of VL. Now for the magic part of this article: résumé-generating config.

R1(config)#router os 1
R1(config-router)#no capability transit

Transit area capability is a feature introduced in OSPFv2 that allows traffic following a virtual link to take a more optimal path if one exists. It compares the prefix received via virtual-link with LSA3; if there is an exact prefix match and LSA3 describes a better path – choose path via LSA3. OSPFv1 originally suggested that traffic should take the same scenic route as virtual links emulating area 0 point-to-point circuits. If you want to get more intimate with transit areas and virtual-links, I suggest reading this article by Petr Lapukhov. If you feel confident with the basics of VL, go ahead and jump straight into verification:

R1#traceroute 5.5.5.5 so lo 0 n
Type escape sequence to abort.
Tracing the route to 5.5.5.5
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.12.2 44 msec 16 msec 20 msec
  2 192.168.23.3 20 msec 40 msec 40 msec
  3 192.168.35.5 76 msec 44 msec 44 msec

Now R1 traffic takes a longer path compared to the initial choice. The reason for that is the absence of transit capability that forces R1 to send packets along the path VL takes: R1-R2-R3. Although it is not optimal, such a behavior might be expected; the connectivity is still there. However, there is no sign of a loop. Yet. Let’s add some spice:

R2 R3
R2(config)#int f1/0
R2(config-if)#ip os cost 100
R3(config)#int f1/0
R3(config-if)#ip os cost 100

 OK, we made a worse path even less optimal, so what?

R1#traceroute 5.5.5.5 so lo 0 n
Type escape sequence to abort.
Tracing the route to 5.5.5.5
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.12.2 20 msec 16 msec 16 msec
  2 192.168.12.1 24 msec 16 msec 16 msec
  3 192.168.12.2 36 msec 32 msec 44 msec
  4 192.168.12.1 28 msec 36 msec 40 msec
  5 192.168.12.2 44 msec 48 msec 64 msec
  6 192.168.12.1 60 msec 60 msec 60 msec
  7 192.168.12.2 80 msec 80 msec 80 msec
  8 192.168.12.1 84 msec 76 msec 76 msec
<output greedily omitted>

Hail to the loop, ladies and gentlemen! Let’s take a few sips of coffee and reflect on what has been done so far:

  • VL between R1-R3;
  • Transit capability disabled;
  • R2-R3 cost increased.

The latter point could be rephrased though: R2 now chooses R1 as a next-hop for 5.5.5.5/32:

R2#sho ip ro 5.5.5.5 255.255.255.255 longer-prefixes 
      5.0.0.0/32 is subnetted, 1 subnets
O IA     5.5.5.5 [110/4] via 192.168.12.1, 00:06:21, FastEthernet0/1

Such a choice is rather expected. R2, on the other hand, might have made a weird pick on the first glance:

R1#sho ip ro 5.5.5.5 255.255.255.255 longer-prefixes 
      5.0.0.0/32 is subnetted, 1 subnets
O        5.5.5.5 [110/103] via 192.168.12.2, 00:08:07, FastEthernet0/1

However, it’s a natural choice according to the rules without transit capability:

  • 5.5.5.5/32 is reachable via VL in A0
  • traffic follows VL path

And, of course, the latter is redirected back by R2 that has no clue about VL nor transit capability; in this case - just following LSA3.

So far it seems that transit capability is the villain here, not a VL. The explanation is rather simple: such a problem existed in OSPFv1 and was resolved by transit capability introduced in OSPFv2. One might say that such a behavior resembles microloops and I would agree; however, microloops is a transient state while OSPFv1 VL might have introduced a permanent loop.

OSPFv2 RFC includes the following about differences from OSPFv1:

When summarizing information into a virtual link's transit area, version 2 of the OSPF specification prohibits the collapsing of multiple backbone IP networks/subnets into a single summary link.

The problem gallantly described in this statement was actually a permutation of the behavior in our topology. If OSPF backbone routes were summarized on ABRs, this summary would not used by virtual-link router. The forwarding would be broken by different views on the topology:

  1. Virtual-link router followed native backbone route via virtual-link, exactly the same path as virtual-link;
  2. Everyone else in the area except ABRs followed summarized route.

In our case, if R3 could summarize 5.5.5.0/24 for instance, R2 would choose another summirized 5.5.5.5/25 path via R4 causing a loop. Solution? Enforce consistent view on the topology by prohibiting area 0 route modification a.k.a. summarization. Note that there is no need to enforce the same for summary from other areas – those summaries would be consistent in area 0 and further anyway because only the first ABR is allowed to summarize the area prefixes. Moreover, while summarization is prohibited, there is no such restriction for filtering with transit capability. Filtering does not modify the prefix so transit capability has just fewer LSA1-LSA3 pairs to choose from.

Lesson learned: some defaults are there for a reason.

P.S. There is yet another way to shoot yourself in the leg, although it’s a fairly known gotcha.

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

Introduction

Design

Wired connection

Switch configuration

AiMesh configuration

Additional parameters

Conclusion

Introduction

ASUS AiMesh is the ultimate home WiFi solution that lets you enjoy a stable, seamless and secure wireless connection anywhere in your home. With powerful features, exceptional ease of use and no trade-offs between WiFi range and maximum speed, you can finally experience WiFi the way it should be.

That’s exactly how ASUS presents its AiMesh technology which is supported by the majority of wireless dual-band router new models of this vendor. With the help of AiMesh technology mesh wireless network is created. Client devices connect to the node with the highest signal level and on moving a client device inside the coverage area roaming, which means reconnecting to another AiMesh node, takes place.

Design

The system consists of a wireless AiMesh router, which is actually a central point of the wireless network, and one or several AiMesh nodes. By default, for connection between AiMesh nodes a dedicated channel operating at 5 GHz frequency is used.

Establishing a direct connection between the network nodes is available. At the moment this article is being written, up to two wireless hierarchy levels were supported.

A pleasant feature of such mesh network is the support of self-healing function. In case of any node failure, all equipment reconnects to appropriately functioning nodes by default.

In addition to wireless connection of AiMesh nodes to each other, one can use wired connections as well.

That’s exactly a wired connection of AiMesh nodes to a wirelss router we are going to discuss today.

Wired connection

Wireless connection of AiMesh nodes to the central wireless router is all good except the performance. Yep, 5 GHz modules of several router models are perfectly able to step over 1 Gbps performance limit. However, this is valid only under certain conditions. Mutual nodes position together with the absence of direct visibility between them can significantly decrease the performance of wireless transit channels. One can avoid decreasing of node-router channels performance by using wired connection. In this case, based on the vendor recommendations, one should connect WAN port of AiMesh node with LAN port of AiMesh router the way it is shown in the picture above.

But what should one do if such cable connection is impossible? It should seem that there is a rather simple solution: one should place a switch between the router and AiMesh node, that means connecting both devices to a new or already existing L2 network, adding them to the same VLAN. However, in practice it is not so simple.

For mutual detection wireless devices in AiMesh network use LLDP. This protocol is intended for exchanging messages about the abilities supported by each side. LLDP messages exchange is possible only with directly connected nearby device that means the router and AiMesh node will not see each other but will exchange LLDP messages with the switch between them.

switch#sho lld ne
Capability codes:
    (R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
    (W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
Device ID           Local Intf     Hold-time  Capability      Port ID
Cat3750             Gi1/0/2        120        B,R             Gi1/0/14
0015.176a.f39b      Gi1/0/4        3601                       0015.176a.f39b
RT-AX56U            Gi1/0/5        20         B               04d9.f5b4.68a0
Total entries displayed: 3

Fortunately, there is a solution for this situation as well. For sending LLDP messages through the switch one can use QinQ technology which allows transparently tunneling messages of several service protocols through L2 network.

Switch configuration

We had Cisco Catalyst C3560CX-8XPD switch that supports QinQ technology and ASUS RT-AX88U, RT-AX56U, and RT-AX58U wireless routers all to ourselves. We started with creation of a dedicated virtual network for AiMesh network data transmission. This is exactly an ordinary VLAN without any additional settings.

switch(config)#vla 17
switch(config-vlan)#name
switch(config-vlan)#name AiMesh
switch(config-vlan)#^Z
switch#

Then we configured interfaces to which AiMesh network members connect. The interface configuration doesn’t depend on the fact whether a wireless router or AiMesh node is connected to it. Clarifications to the commands are shown inline below.

interface GigabitEthernet1/0/5
 switchport access vlan 17
 !Select of VLAN in which AiMesh data is transmitted
 switchport mode dot1q-tunnel
 !Select interface operation mode
 l2protocol-tunnel lldp
 !List of tunneling protocols
 no lldp transmit
 no lldp receive
 !Turn off switch LLDP messages receive/transmit

In addition to tunneling of LLDP messages, our switch allows sending messages of several other protocols, however this is not required for AiMesh proper work

switch(config-if)#l2protocol-tunnel ?
  cdp                 Cisco Discovery Protocol
  drop-threshold      Set drop threshold for protocol packets
  lldp                Link Layer Discovery Protocol
  point-to-point      point-to-point L2 Protocol
  shutdown-threshold  Set shutdown threshold for protocol packets
  stp                 Spanning Tree Protocol
  vtp                 Vlan Trunking Protocol
  <cr>

The MAC address table corresponding to our virtual network contains MAC addresses of both AiMesh equipment and client devices.

switch#sho mac address-table int gi1/0/5
          Mac Address Table
-------------------------------------------
Vlan    Mac Address       Type        Ports
----    -----------       --------    -----
  17    04d9.f5b4.68a0    DYNAMIC     Gi1/0/5
  17    d868.c310.f0f1    DYNAMIC     Gi1/0/5
Total Mac Addresses for this criterion: 2

From the side of wireless devices supporting AiMesh network one should also perform a short configuration to which description we are going to.

AiMesh configuration

For each AiMesh node one should select a way of its connection to AiMesh wireless router (Connection Priority option).

The way of node connection to the central router used at the moment is also displayed as an icon in the list of connected AiMesh nodes.

That's all. Configuration of AiMesh wireless network nodes is completed here.

Additional parameters

A significant part of settings corresponding to WiFi module operating is sent by AiMesh wireless router to subordinate nodes during their connection. However, there are several options that can be configured individually for each device. Among these options is, for example, working of LEDs on the front panel of the devices forming AiMesh network.

The connection to AiMesh nodes with the help of HTTP/HTTPS is impossible, so the only way of their configuration is using of SSH protocol (that is not recommended for the general users by the vendor). One can find out IP address of AiMesh node with the help of AiMesh node card.

For switching off LEDs on the front side led_disable parameter is used, that is set with the help of nvram utility which we have already described.

admin@RT-AX58U-3F40:/# nvram
======== NVRAM CMDS ========
[set]                   : set name with value
[setflag]               : set bit value
[unset]                 : remove nvram entry
[get]                   : get nvram value with name
[getflag]               : get bit value
[show:dump:getall]      : show all nvrams
[loadfile]              : populate nvram value from files
[savefile]              : save all nvram value to file
[kset]                  : set name with value in kernel nvram
[kunset]                : remove nvram entry from kernel nvram
[kget]                  : get nvram value with name
[commit]                : save nvram [optional] to restart wlan when following restart
[restart]               : restart wlan
[save]                  : save all nvram value to file
[restore]               : restore all nvram value from file
[erase]                 : erase nvram partition
[fb_save]               : save the romfile for feedback
============================
admin@RT-AX58U-3F40:/# nvram show | grep led_disable
size: 67930 bytes (63142 left)
led_disable=0
admin@RT-AX58U-3F40:/# nvram set led_disable=1
admin@RT-AX58U-3F40:/# nvram get led_disable
1
admin@RT-AX58U-3F40:/# nvram commit

After all necessary configurations are performed, one should reload AiMesh node manually or remotely using reboot command.

Conclusion

Certainly, we realize that Cisco switching equipment described in our article will be hardly ever set at houses of the majority of users. However, if a managed switch of another vendor is used in the network there is a probability of its QinQ technology support, that allows combining several AiMesh devices within one virtual network. Anyhow, for the most users using of switch is not required at all, as for configuring a small AiMesh network LAN ports of AiMesh router (some models have eight LAN ports) are just enough.

In addition to this, we would like to notice that vendor is working on fixing the issue as well. Probably, it will be solved either by using another protocol of AiMesh neighbors detection or by updating an existing one.

User Rating: 5 / 5

Star ActiveStar ActiveStar ActiveStar ActiveStar Active

Spoke to spoke multicast in DMVPN

There are quite a few Configuration Guides and articles in the Internet that provide detailed explanation of DMVPN setup and operation. However, the best note about multicast over DMVPN I could find did not shed any light on details of the process.

“In DMVPN, it is recommended to configure a Rendezvous Point (RP) at or behind the hub. If there is an IP multicast source behind a spoke, the ip pim spt-threshold infinity command must be configured on spokes to avoid multicast traffic going through spoke-to-spoke tunnels.”

Let’s try to find out what this restriction is about. Here is our lab topology.

Just an ordinary DMVPN Phase 2 lab in GNS3. Loopbacks on every router are used for emulating adjacent client LANs; also, RP is somewhat sensible to be placed on the Hub. For addressing, I consider: Hub = R1, Spoke1 = R2, Spoke2 = R3, Internet = R4.

Let’s enable PIM in the overlay on every DMVPN router and configure RP.

Hub Spoke1 Spoke2

interface Loopback0
 ip address 1.1.1.1 255.255.255.255
 ip pim sparse-mode
interface Tunnel0
 ip address 192.168.0.1 255.255.255.0
 no ip redirects
 no ip split-horizon eigrp 1
 ip pim sparse-mode
 ip nhrp map multicast dynamic
 ip nhrp network-id 1
 tunnel source FastEthernet0/0
 tunnel mode gre multipoint
 tunnel vrf A
ip pim rp-address 1.1.1.1

interface Loopback0
ip address 2.2.2.2 255.255.255.255
ip pim sparse-mode
interface Tunnel0
ip address 192.168.0.2 255.255.255.0
no ip redirects
ip pim sparse-mode
ip nhrp network-id 1
ip nhrp nhs 192.168.0.1 nbma 192.168.14.1 multicast
tunnel source FastEthernet0/1
tunnel mode gre multipoint
tunnel vrf A
ip pim rp-address 1.1.1.1

 interface Loopback0
ip address 3.3.3.3 255.255.255.255
ip pim sparse-mode
interface Tunnel0
ip address 192.168.0.3 255.255.255.0
no ip redirects
ip pim sparse-mode
ip nhrp network-id 1
ip nhrp nhs 192.168.0.1 nbma 192.168.14.1 multicast
tunnel source FastEthernet1/0
tunnel mode gre multipoint
tunnel vrf A
ip pim rp-address 1.1.1.1

Everything looks good so far, we are ready to subscribe for multicast on Spoke1.

Spoke1(config)#int lo 0
Spoke1(config-if)#ip igmp join-group 224.1.1.1
Spoke1#sho ip mroute
<output omitted>
(*, 224.1.1.1), 00:00:37/00:02:22, RP 1.1.1.1, flags: SJCL
  Incoming interface: Tunnel0, RPF nbr 192.168.0.1
  Outgoing interface list:
    Loopback0, Forward/Sparse, 00:00:37/00:02:22

Now it’s time to start streaming from Spoke2.

Spoke2#ping 224.1.1.1 source lo0 rep 1000 Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 224.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3
Reply to request 0 from 2.2.2.2, 156 ms....

Oops. It seems like the first packet is going through but all the rest are dropped somewhere along the way. Let’s take a look at traffic capture at this moment.

Spoke1

Hub

So Hub does not send any multicast after initial flow. What is going on with the hub?

Hub#sho ip mroute
<output omitted>
(*, 224.1.1.1), 00:03:31/00:02:55, RP 1.1.1.1, flags: SP
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list: Null
(3.3.3.3, 224.1.1.1), 00:03:08/00:02:46, flags: PJT
  Incoming interface: Tunnel0, RPF nbr 192.168.0.3
  Outgoing interface list: Null

No interfaces are present in OIL for (S,G) entry! OK, at least we found the symptom of the problem. But why the initial packet went through then? Let’s get back into the past for Hub.

Hub#sho ip mroute
<output omitted>
(*, 224.1.1.1), 00:00:13/00:03:16, RP 1.1.1.1, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Tunnel0, Forward/Sparse, 00:00:13/00:03:16

(*,G) has Tunnel0 in OIL as expected; no multicast has been seen yet so RPF neighbor is also unknown. Let’s discuss what happens next.

  1. Spoke2 sends initial multicast packet in unicast RP-Register.
  2. Hub aka RP receives RP-Register and does two things: sends multicast across OIL (Tunnel0) and sends PIM Join for multicast source (Tunnel0) as well.
  3. Since in PIM-SM incoming interface (IIF) cannot also be present in OIL (RPF check), Tunnel0 is removed from OIL and Spoke2 loses multicast stream.

The problems stems from NBMA nature of DMVPN: Spoke2 has no L2 connectivity to Spoke1 although Tunnel0 seems like a broadcast media (if you have a flashback now about Frame-Relay, that’s exactly where this setup comes from). The remediation is quite simple: tell Hub to consider Tunnel0 interface as multiple logical ones for the sake of multicast.

Hub#sho run | i Tunnel0|nbma
interface Tunnel0
ip pim nbma-mode

Now the multicast RIB looks correct.

Hub#sho ip mroute
(*, 224.1.1.1), 00:03:51/00:03:27, RP 1.1.1.1, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Tunnel0, 192.168.0.2, Forward/Sparse, 00:00:02/00:03:27
(3.3.3.3, 224.1.1.1), 00:03:29/00:02:25, flags: JT
   Incoming interface: Tunnel0, RPF nbr 192.168.0.3
   Outgoing interface list:
    Tunnel0, 192.168.0.2, Forward/Sparse, 00:00:02/00:03:27

Tunnel0 is now correctly considered as multipoint interface so mRIB entries also include the addresses of the source (192.168.0.3) and receiver (192.168.0.2). This has also an interesting side effect for multicast traffic sourced from behind Hub towards Spokes. By default, DMVPN replicates multicast traffic towards every spoke (ip nhrp map multicast dynamic) and this is successfully leveraged by IGPs that send their hello packets as multicast. However, if DMVPN uses geographically dispersed underlay (e.g. several remote regions), such a behavior might not be desirable: multicast traffic destined for only a single region will effectively reach every spoke in every region thus unnecessarily saturating the links. In this case, PIM NBMA mode would allow differentiating spokes and sending the multicast traffic only to regions that have actually subscribed to the stream.

So, let’s check the multicast connectivity between spokes once again.

Spoke2#ping 224.1.1.1 so lo 0 rep 1000 Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 224.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3
Reply to request 0 from 2.2.2.2, 176 ms....

Well, something is still broken, time to investigate to usual suspect, Hub.

Hub#sho ip mroute
<output omitted>
(*, 224.1.1.1), 00:52:32/00:02:58, RP 1.1.1.1, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Tunnel0, 192.168.0.2, Forward/Sparse, 00:02:12/00:02:58
(3.3.3.3, 224.1.1.1), 00:01:30/00:01:31, flags: PT
  Incoming interface: Tunnel0, RPF nbr 192.168.0.3
  Outgoing interface list: Null

(S,G) is pruned (flag P) on Hub so OIL is empty. Obviously, it is not Spoke2 who pruned the group, it must be Spoke1.

Spoke1#sho ip mroute
<output omitted>
(*, 224.1.1.1), 00:52:44/stopped, RP 1.1.1.1, flags: SJCL
  Incoming interface: Tunnel0, RPF nbr 192.168.0.1
  Outgoing interface list:
    Loopback0, Forward/Sparse, 00:09:18/00:02:26
(3.3.3.3, 224.1.1.1), 00:01:39/00:01:20, flags: LJT
  Incoming interface: Tunnel0, RPF nbr 192.168.0.3
  Outgoing interface list:
    Loopback0, Forward/Sparse, 00:01:39/00:02:26

Looks fine… Unless you pay attention to RPF neighbor – it is Spoke2, not Hub. Let’s reconstruct the crime scene.

  1. Spoke2 sends first multicast packet in RP-Register to the Hub;
  2. Hub forwards multicast packet to Spoke1 and initiates SPT to Spoke2;
  3. Spoke1 receives the first multicast packet, creates a state, sends unicast reply;
  4. Spoke1 realizes that RPF neighbor for multicast source is Spoke2 so it sends SPT-Join towards Spoke2; (However, that’s only logical view; in fact, the SPT-Join is sent to Hub because of DMVPN mapping. The latter, however, drops the packet because Spoke2 is listed as RPF neighbor in this Join.)
  5. Incoming interface (Tunnel0) is the same for RPT and SPT thus (*,G) Prune is sent to Hub and is processed correctly.

As a result, Hub deactivates (*,G) entry and Spoke1 never gets to create (S,G) in mRIB resulting in broken connectivity. The source of evil here is SPT-switchover: since the spokes do not have multicast mapping to each other, the only feasible path for their multicast traffic is along tunnels towards hub. Finally, we get to the command Configuration Guide mentions – ip pim spt-threshold infinity. Does it work in the end?

Spoke2#ping 224.1.1.1 so lo 0 rep 1000
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 224.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3
Reply to request 0 from 2.2.2.2, 112 ms
Reply to request 1 from 2.2.2.2, 84 ms
Reply to request 2 from 2.2.2.2, 76 ms
Reply to request 3 from 2.2.2.2, 80 ms
Reply to request 4 from 2.2.2.2, 52 ms
Reply to request 5 from 2.2.2.2, 48 ms

Fortunately, it does finally behave as expected. As a result, multicast traffic is able to traverse DMVPN in any direction (hub-spoke, spoke-spoke, spoke-hub) and only to the receivers subscribed to the stream. It should be noted though that direct spoke-to-spoke tunnels for multicast traffic might not be the best design choice even if such a setup were possible. Spokes are usually limited in bandwidth and availability; using head-end DMVPN multicast replication would put a high load on the spoke’s uplinks, posing scalability challenges and potentially having a detrimental effect on other applications. In such a case winning lower delay is not usually worth the price in bandwidth.

In this article we discussed generic DMVPN Phase 2 solution with multicast on top of it. The major caveats of this solution are PIM NBMA mode and SPT-switchover that are also the only difference from the ordinary DMVPN Phase 2 deployment.