Not all stacks are good for you, or StackWise stacks
Some models of Cisco switches can be stacked with special stacking cables. Normally, these are devices occupying one or two units in the rack; often with fixed configuration. Certainly, module chassis series also have mechanisms for grouping several physical devices. However, in this article we’re not going to speak about these mechanisms, but will concentrate only on physical stacking.
When the article was being written, we were aware of four versions of the stack: FlexStack, StackWise, StackWise Plus and StackWise-480. Cisco Catalyst 2960S switches can be stacked by the Flexstack technology. The throughput of a Flexstack stack is 20 Gbps in the duplex mode, i.e. 10 Gbps in each way. The StackWise technology is supported by Catalyst 3750 switches. The throughput of a StackWise stack is 32Gbps in a duplex, i.e. 16 Gbps in each way. Cisco Catalyst 3750-E and 3750-X series switches support the StackWise Plus technology which differs from StackWise by the increased throughput to 64/32 Gbps and a number of other features. It’s also worth noting that the Catalyst 3750-X series also supports the StackPower technology which allows ensuring stable operation of the devices in the stack in case of a power block failure of one of the switches. Cisco Catalyst 3850 switches support the StackWise-480 stack the throughput of which is 480 Gbps.
When switches are stacked, it gives the administrator a number of advantages: joined management and switching subsystems, higher stability and high availability. For example, one can manage the whole stack as if it were one device with the help of one telnet/SSH session. Frames and packets are transmitted between the switches via stack ports thus freeing up ordinary 10/100 mbps and 1/10 Gbps interfaces. Within the stack EtherChannel functions are available for the cases when physical channels are connected to different switches (something similar - (vPC – virtual PortChannel) exists for Cisco Nexus L3-switches, but they belong to a completely different class.)
One of the differences between the StackWise and StackWise Plus technologies is support of local switching. The thing with local switching is that when the sender and the receiver are connected to the ports of one switch, the frame is sent directly bypassing the stack. When local switching is not supported, i.e. when StackWise is used, all data are sent through the stack. The frame is deleted from the stack by the switch of the sender. Thus, if the sender and the receiver are connected to one device, data go through the stack-ring, return to the same switch and are then sent to the required port. So, even if the sender and the receiver are connected to neighboring ports of one switch, data will still go through the stack. Naturally, such “wandering” of the frames causes additional delay. The StackWise Plus technology is free of this disadvantage, because unnecessary data are not sent to the stack (Local Switching) and data from the stack are deleted by the receiver’s switch (Destination Stripping). The delay caused by the stack is sure to be small; however, in the situation when every microsecond may be crucial, even this tiny improvement may be noticeable. In this article we decided to find out what quantitative influence a StackWise stack has on transmitted traffic.
We had two Cisco Catalyst 3750G switches at our disposal: C3750G-12S-S and C3750G-24TS-S1U, both supporting StackWise. We ran our tests together with our friends from the Anticisco project with the help of IXIA XM2. In the tests, we used five different connection scenarios that are listed below.
- Without using the stack the sender and the receiver were connected to the first and the twenty-fourth interfaces
- Without using the stack the sender and the receiver were connected to the first and the second interfaces
- In a stack, the sender and the receiver were connected to the first and the twenty-fourth interfaces of one switch
- In a stack, the sender and the receiver were connected to the first interface of different switches
- In a stack, the sender and the receiver were connected to the first and the second interfaces of one switch
For each scenario we measured the minimum, average and maximum delays in transmitting 100-, 400-, 1400- and 1518-byte frames. Frames of each size were transmitted when the switch interfaces were 10, 55, 77.5, 88.75 and 100% loaded. On the plots below you can see only average delays.
Analyzing each of the plots, we can say that the switching delay directly depends on the interface current load and the size of the transmitted frame. Quite a predictable result!
Then, for the sake of better visualization, we decided to exclude delays obtained at 100% interface load and to depict delays for all scenarios on one plot. We made such plots for each frame size we used in our tests.
As we see, the user connection scenarios were distributed in the following order (the delay from the smallest to the biggest): 1, 2, 4, 3, 5. Here, we’d like to share some ideas regarding the obtained test results.
Judging by the delay figures, switching between interfaces from different interface groups is faster; this phenomenon is not related to the stack, but rather to other peculiarities of Cisco Catalyst switches inner architecture. The stated dependency is observed both on stacked switches and standalone ones.
Including a switch into a stack increases the delay when data are transmitted between the same ports. This phenomenon is related to the necessity of transmitting data via the stack-ring in the StackWise technology. This additional delay caused by the stack can be roughly estimated to be around 1.5 microseconds. This estimate is drawn from the comparison of the following pairs of scenarios: 1 and 3, 2 and 5.
The additional delay for passing between interfaces of different switches is approximately equal to one half of the delay when switching between the interfaces of one device. This phenomenon is easily accounted for, as when the users are connected to different switches, the frame only has to pass half of the stack-ring before it gets to the receiver’s switch. However, there’s a nuance here: after the frame is delivered to the receiver’s switch, it doesn’t stop its motion along the stack ring; this motion will take it additional 0.75 microseconds, but during this time it will be already processed by the switching matrix of the device the receiver is connected to; thus the frame’s further motion in the stack no longer influences the overall delay. This frame will be deleted from the stack by the sender’s switch.
Here we’re finishing our measurements and discussion of the delays caused by a Cisco StackWise stack. Now let’s see what we’ve got. Those of you who’d like to analyze the obtained data themselves can download an archive with the measurements results from our site.
Stacking a Cisco Catalyst switch into a StackWise stack increases the switching delay for users connected to this switch. This behavior is caused by the stack operation logic: the frame is transmitted through it no matter where the sender and the receiver are located relative to each other.
The StackWise technology is compatible with StackWise, i.e. Cisco Catalyst switches 3750-X and 3750-E can be stacked together with ordinary switches of the 3750 series, however, in this case the stack itself will be constructed by the StackWise technology. The only thing that one has to pay attention to is support of local switching by the 3750-X and 3750-E series, i.e. these switches will not send frames into the stack if the sender and the receiver are connected to one device, which should decrease the switching delay.
To decrease the switching delay when using 2960-S and 3750 models, we advise connecting the sender and the receiver to interfaces of different interface-groups of a switch which is not in the stack. When this method is not available, then use interfaces of one interface-group but also in a standalone switch. Here we only consider the issue of decreasing the switching delay and pay no attention to other aspects of operation of switches and stacks (like fault tolerance).
If you can’t do without a stack, try connecting in such a way that frames from the sender to the receiver go by the shortest path in the stack. When the magnitude of RTT is of importance the sender and the receiver should be connected to different switches in the stack. Reducing the number of switches in the stack also decreases the delay both when the sender and the receiver are connected to one device and to different ones.
In the end, we’d like to draw the reader’s attention to the dependence of the switching delay on network interfaces load. Don’t connect delay sensitive users to highly loaded stacks. When calculating load, pay attention not only to the load of interfaces through which user traffic is transmitted but to the overall stack load as well, as in the StackWise technology all frames go through the stack. You can determine the stack version with the help of the show switch stack-ring speed command; the current interface and stack-ring load can be obtained by running the show controllers utilization command.
test-3750#show switch stack-ring speed
Stack Ring Speed : 32G
Stack Ring Configuration: Full
Stack Ring Protocol : StackWise
test-3750#show controllers utilization
Port Receive Utilization Transmit Utilization
Gi1/0/1 0 0
Gi1/0/2 0 0
Gi1/0/3 0 0
Gi1/0/4 0 0
Gi1/0/5 0 0
Gi1/0/6 0 0
Gi1/0/7 0 0
Gi1/0/8 0 0
Gi1/0/9 0 0
Gi1/0/10 0 0
Gi1/0/11 0 0
Gi1/0/12 0 0
Gi1/0/13 0 0
Gi1/0/14 0 0
Gi1/0/15 0 0
Gi1/0/16 0 0
Gi1/0/17 0 0
Gi1/0/18 0 0
Gi1/0/19 1 0
Gi1/0/20 0 0
Gi1/0/21 0 0
Gi1/0/22 0 0
Gi1/0/23 0 0
Gi1/0/24 0 0
Gi2/0/1 0 0
Gi2/0/2 0 0
Gi2/0/3 0 0
Gi2/0/4 0 0
Gi2/0/5 0 0
Gi2/0/6 0 0
Gi2/0/7 0 0
Gi2/0/8 0 0
Gi2/0/9 0 0
Gi2/0/10 0 0
Gi2/0/11 0 0
Gi2/0/12 0 0
Gi2/0/13 0 0
Gi2/0/14 0 0
Gi2/0/15 0 0
Gi2/0/16 0 0
Gi2/0/17 0 0
Gi2/0/18 0 0
Gi2/0/19 0 0
Gi2/0/20 0 0
Gi2/0/21 0 0
Gi2/0/22 0 0
Gi2/0/23 0 0
Gi2/0/24 0 0
Total Ports : 48
Switch Receive Bandwidth Percentage Utilization : 0
Switch Transmit Bandwidth Percentage Utilization : 0
Stack Ring Percentage Utilization : 0
Good luck in building and using stacks!