The amount of interest in data centers and data center networking continues to grow. For the past decade plus, the most savvy Internet companies have been focusing on infrastructure. Essentially, planetary scale services such as search, social networking, and e-commerce require a tremendous amount of computation and storage. When operating at the scale of tens of thousands of computers and petabytes of storage, small gains in efficiency can result in millions of dollars of annual savings. On the other extreme, efficient access to tremendous amounts of computation can enable companies to deliver more valuable content. For example, Amazon is famous for tailoring web page contents to individual customers based on both their history and potentially the history of similar users. Doing so while maintaining interactive response times (typically responding in less than 300 ms) requires fast, parallel access to data potentially spread across hundreds or even thousands of computers. In an earlier post, I described the Facebook architecture and its reliance on clustering for delivering social networking content.
Over the last few years, academia has become increasingly interested in data centers and cloud computing. One reason is the opportunity for impact; it is clear, that the entire computing industry is undergoing another paradigm shift. Five years from now, it is clear that the way we build out computing and storage infrastructures will be radically different. Another allure of the data center is the fact that it is possible to do “clean slate” research and deployment. One frustration of the networking research community has been the inability to deploy novel architectures and protocols because of the need to be backward compatible and friendly to legacy systems. Check out this paper for an excellent discussion. In the data center, it is at least possible to deploy entirely new architectures without the need to be compatible with every protocol developed over the years.
Of course, there are difficulties with performing data center research as well. One is having access to the necessary infrastructure to perform research at scale. With companies deploying data centers at the scale of tens of thousands of computers, it is difficult for most universities and even research labs to have access to the necessary infrastructure. In our own experience, we have found that it is possible to consider problems of scale even with a relatively modest number of machines. Research infrastructures such Emulab and OpenCirrus are open compute platforms that provide significant amount of computing infrastructure to the research community.
Another challenge is the lack of software infrastructure for performing data center research, particularly in networking. Eucalyptus provides an EC2-compatible environment for cloud computing. However, there is a relative void of available research software for research in networking. Rebuilding every aspect of the protocol stack before performing research in fundamental algorithms and protocols is a challenge.
To partially address this shortcoming, we are release an alpha version of our PortLand protocol. This work was published in SIGCOMM 2009 and targets delivering a unified Layer 2 environment for easier management and support for basic functionality such as virtual machine migration. I discussed our work on PortLand in an earlier post here and some of the issues of Layer 2 versus Layer 3 deployment here.
The page for downloading PortLand is now up. It reflects the hard work of two graduate students in my group, Sambit Das and Malveeka Tewari, who took our research code and ported it HP ProCurve switches running OpenFlow. The same codebase runs on NetFPGA switches as well. We hope the community can confirm that the same code runs on a variety of other OpenFlow-enabled switches. Our goal is for PortLand to be a piece of the puzzle for a software environment for performing research in data center networking. We encourage you to try it out and give us feedback. In the meantime, Sambit and Malveeka are hard at work in adding Hedera functionality for flow scheduling for our next code release.