At the IEEE LEOS meeting, I had the chance to hear an excellent presentation by Donn Lee from Facebook on their network infrastructure and pain points. I first met Donn when I was speaking at Stanford on our own Data Center networking project. He had a lot of great questions and feedback based on his experience at Facebook (not to mention Google, Cisco, etc.).
One interesting thing that came out of the presentation is the rate at which switch capacity has increased relative to the size and bandwidth requirements of data centers over the last decade or so. Today, the biggest switch that one can buy is approximately a 128-port 10 Gigabit Ethernet switch. However, data centers with 100,000’s of thousands of ports are not unheard of today and individual distributed applications can run on tens of thousands of machines.
A significant challenge is interconnecting all of these machines. Donn mentioned that his ideal switch from an operational perspective at Facebook. Would have 1500 ports. The switch would have 1000 10GbE ports facing downward to end hosts (either 10k at 1 GbE each or 1k at 10 GbE each) and 50 100GbE ports facing up to another switch to allow communication with other logical clusters. This suggests a requirement of nonblocking bandwidth at the granularity of 1-10k hosts and an oversubscription ratio of 2 in talking to other clusters. This also suggests a switch that has 15 Terabits/sec of aggregate capacity. The fact that this represents about a factor of 15 more bandwidth than what is available from commercial switches (not to mention the fact that there is no standard for 100 Gigabit Ethernet yet) means that Facebook has to build out complex meshes, presumably with some performance-limiting hashing to map flows to paths.
Given that doubling single switch capacity roughly requires a factor of 4 more logic, and the continued buildout of ever denser data centers, the communication fabric for these data centers is likely to form a top of increasing interest.
My group remains quite interested in this space. In fact, our upcoming Merchant Silicon paper at Hot Interconnects this year considers the design of multi-stage 34 Tbps switch.
UPDATE: The slides from Donn’s talk are now available here.