Archive for the 'Uncategorized' Category

Congratulations to Haifeng Yu

The great thing about working with great students is following their successes throughout their career.

I just learned that Haifeng Yu was promoted to Associate Professor with tenure at National University of Singapore (NUS). Haifeng was my first graduating PhD student and now the first to receive tenure.  Overall, Haifeng has always been drawn to deep, challenging problems and his work is always insightful.  He did his graduate work on consistency models for replicated and distributed systems.  More recently, he has been working on network security, system availability, and most recently coding for wireless systems.  His recent SIGCOMM 2010 paper on flexible coding schemes will be (in my biased opinion) of great interest. In fact, it won the best paper award at the conference.

Congratulations Haifeng!

PortLand Code Release

The amount of interest in data centers and data center networking continues to grow.  For the past decade plus, the most savvy Internet companies have been focusing on infrastructure.  Essentially, planetary scale services such as search, social networking, and e-commerce require a tremendous amount of computation and storage.  When operating at the scale of tens of thousands of computers and petabytes of storage, small gains in efficiency can result in millions of dollars of annual savings.  On the other extreme, efficient access to tremendous amounts of computation can enable companies to deliver more valuable content.  For example, Amazon is famous for tailoring web page contents to individual customers based on both their history and potentially the history of similar users.  Doing so while maintaining interactive response times (typically responding in less than 300 ms) requires fast, parallel access to data potentially spread across hundreds or even thousands of computers.  In an earlier post, I described the Facebook architecture and its reliance on clustering for delivering social networking content.

Over the last few years, academia has become increasingly interested in data centers and cloud computing. One reason is the opportunity for impact; it is clear, that the entire computing industry is undergoing another paradigm shift.  Five years from now, it is clear that the way we build out computing and storage infrastructures will be radically different.  Another allure of the data center is the fact that it is possible to do “clean slate” research and deployment.  One frustration of the networking research community has been the inability to deploy novel architectures and protocols because of the need to be backward compatible and friendly to legacy systems.  Check out this paper for an excellent discussion. In the data center, it is at least possible to deploy entirely new architectures without the need to be compatible with every protocol developed over the years.

Of course, there are difficulties with performing data center research as well.  One is having access to the necessary infrastructure to perform research at scale.  With companies deploying data centers at the scale of tens of thousands of computers, it is difficult for most universities and even research labs to have access to the necessary infrastructure.  In our own experience, we have found that it is possible to consider problems of scale even with a relatively modest number of machines.  Research infrastructures such Emulab and OpenCirrus are open compute platforms that provide significant amount of computing infrastructure to the research community.

Another challenge is the lack of software infrastructure for performing data center research, particularly in networking.  Eucalyptus provides an EC2-compatible environment for cloud computing.  However, there is a relative void of available research software for research in networking.  Rebuilding every aspect of the protocol stack before performing research in fundamental algorithms and protocols is a challenge.

To partially address this shortcoming, we are release an alpha version of our PortLand protocol.  This work was published in SIGCOMM 2009 and targets delivering a unified Layer 2 environment for easier management and support for basic functionality such as virtual machine migration.  I discussed our work on PortLand in an earlier post here and some of the issues of Layer 2 versus Layer 3 deployment here.

The page for downloading PortLand is now up.  It reflects the hard work of two graduate students in my group, Sambit Das and Malveeka Tewari, who took our research code and ported it HP ProCurve switches running OpenFlow.  The same codebase runs on NetFPGA switches as well.  We hope the community can confirm that the same code runs on a variety of other OpenFlow-enabled switches.  Our goal is for PortLand to be a piece of the puzzle for a software environment for performing research in data center networking.  We encourage you to try it out and give us feedback.  In the meantime, Sambit and Malveeka are hard at work in adding Hedera functionality for flow scheduling for our next code release.

The Achilles’ Heel of Performance Isolation in the Cloud

I just read an interesting article “Has Amazon EC2 Become Oversubscribed?” The article describes one company’s relatively large-scale usage of EC2 over a three-year period.  Apparently, over the first 18 months, EC2 small instances provided sufficient performance for their largely I/O bound network servers.  Recently however, they have increasingly run into problems with “noisy neighbors”.  They speculate that they happen to be co-located with other virtual machines that are using so much CPU that the small instances are unable to get their fair share of the CPU.  They recently moved to large instances to avoid the noisy neighbor problem.

More recently however, they have been finding unacceptable local network performance even on large instances, with ping times ranging from hundreds of milliseconds to even seconds in some cases.  This increase in ping time is likely due to overload on the end hosts, with the hypervisor unable to keep up with even the network load imposed by pings.  (The issue is highly unlikely to be in the network because switches deployed in the data center do not have that kind of buffering.)

The conclusion from the article, along with the associated comments, is that Amazon has not sufficiently provisioned EC2 and that is the cause of the overload.

While this is purely speculation on my part, I believe that underprovisioning of the cloud compute infrastructure is unlikely to be the sole cause of the problem.  Amazon has very explicit descriptions of the amount of computing power associated with each type of computing instance.  And it is fairly straightforward to set the hypervisor to allocate a fixed share of the CPU to individual virtual machine instances.  I am assuming that Amazon has set the CPU schedulers to appropriately reserve the appropriate portion of each machine’s physical CPU (and memory) to each VM.

For CPU-bound VM’s, the hypervisor scheduler is quite efficient at allocating resources according to administrator-specified levels.  However, the achilles heel of scheduling in VM environments is I/O.  More particularly, the hypervisor typically has no way to account for the work performed on behalf of individual VMs in either the hypervisor or (likely the bigger culprit) in driver domains responsible for things like network I/O or disk I/O.  Hence, if a particular instance performs significant network communication (either externally or to other EC2 hosts), the corresponding system calls will first go into the local kernel.  The kernel likely has a virtual device driver for either the disk or the NIC.  However, for protection, the virtual device driver cannot have access to the actual physical device.  Hence, the kernel driver must transfer control to the hypervisor, which in turn likely transfers control to a device driver likely running a separate domain.

The work done in the driver domain on behalf of a particular is difficult to account for.  In fact, this work is typically not “billed” back to the original domain.  So, a virtual machine can effectively mount a denial of service attack (whether malicious or not) on other co-located VM’s simply by performing significant I/O.  With colleagues at HP Labs, we wrote a couple of papers investigating this exact issue a few years back:

As mentioned above however, without having access to the actual workloads on EC2, it is impossible to know whether the hypervisor scheduler is really the culprit.  It will be interesting to see whether Amazon has a response.



Follow

Get every new post delivered to your Inbox.

Join 32 other followers