Hope you ran in to these links before….one is Tolly and other is Folly…..So I started looking for Truly…..and found this…IS IT TRULY?? You tell me after reading my stolly…(sorry i tried to rhyme
)
HP sponsored Tolly report saying UCS bandwidth sucks as compared to HP BladeSystem…
http://www.tolly.com/Docdetail.aspx?Docnumber=210109
Brad Hedlund, Data Center Solutions Architect at Cisco Systems, calling it Folly:
Tolly’s paper, it actually exposes bandwidth bottle necks with FEX “pinning” feature and it’s diminishing effect on UCS scalability.
Here is my analysis on it:
This surprised me as I understood it! As I understood this ”pinning” feature of UCS Fabric Extender (IO Module)!!! It has fixed configurations depending on how many uplinks are used to connect FEX to 6100 FI.
From UCS Manager GUI config guide:
Pinning Server Traffic to Server Ports
All server traffic travels through the I/O module to server ports on the fabric interconnect. The number of links for which the chassis is configured determines how this traffic is pinned.
The pinning determines which server traffic goes to which server port on the fabric interconnect. This pinning is fixed. You cannot modify it. As a result, you must consider the server location when you determine the appropriate allocation of bandwidth for a chassis.
You must review the allocation of ports to links before you allocate servers to slots. The cabled ports are not necessarily port 1 and port 2 on the I/O module. If you change the number of links between the fabric interconnect and the I/O module, you must reacknowledge the chassis to have the traffic rerouted.
For example: if you use 2 uplinks today and tomorrow you decide to use 2 more, then to optimize bw use of servers, you got to physically shuffle servers, so that two busy server don’t pair up. In a virtualized evironment this means atleast move VMs around right physical server pair for proper bw optimization!! Otherwise they may have to fight for BW
So you can have two servers per uplink leading to 2:1 subscription if all the four links are used, but the traffic flow are fixed.
For example: Traffic from Server 1 and 5 share uplink1. You don’t have a choice to change that, but your choice is if server A and B has higher bw requirements, then you don’t want them in Slot 1&5 or 2&6 or 3&7 or 4&8 together, to avoid BW starvation. So you may want to plug Server A with Server C because server C needs little bw.
This paper exposes this fundamental design problem and highlights it as limited BW aggregation capability. In the enthusiasm of doing that, they forgot that Cisco uses two FEX modules and both can be active at the same time. So effectively aggregate uplink bw will be 9.1 x 2 = 18.2 Gbps per two servers if all uplinks are used.
So from server pair point of view, if it has 2×10gig CNAs, then 40gig downlink traffic should share 20Gig uplink bw. Means 2:1 oversubcription….
If scaled to 320 servers as Cisco UCS claims, then the oversubscription will be 8:1, in other words, if customer has apps running on these blades that need high bw, then scalability story runs short quickly..
I have posted a comment on Brad’s blog to his response to Tolly with his Folly, I wonder if he is going to publish it and is up to my challege to explain the bandwidth and scalability story truly….
So this is my TRULY for that TOLLY and FOLLY….
What is yours?








