Craig Labovitz made a very interesting presentation e the recent NANOG meeting on the most recent measurements from Arbor’s ATLAS Internet observatory. ATLAS takes real time Internet traffic measurements from 110+ ISPs with real-time access to more than 14 Tbps of Internet access. One of the things that makes working in and around Internet research so interesting (and gratifying) is that the set of problems are constantly changing because the way that we use the Internet and the requirements of the applications that we run on the Internet are constantly evolving. The rate of evolution has thus far been so rapid that we constantly seem to be hitting new tipping points in the set of “burning” problems that we need to address.
Craig, currently Chief Scientist at Arbor Networks, has long been at the forefront of identifying important architectural challenges in the Internet. His modus operandi has been to conduct measurement studies at a scale far beyond what might have been considered feasible at any particular point in time. His paper on Delayed Internet Routing Convergence from SIGCOMM 2000 is a classic, among the first to demonstrate the problems with wide-area Internet routing using a 2-year study of the effects of simulated failure and repair events injected from a “dummy” ISP and the many peering relationships that MERIT enjoyed with TIER-1 ISPs. The paper showed that Internet routing, previously thought to be robust to failure, would often take minutes to converge after a failure event as a result of shortcomings of BGP and the way that ISPs typically configured their border routers. This paper spawned a whole cottage industry on research into improved inter-domain routing protocols.
This presentation had three high level findings on Internet traffic:
- Consolidation of Content Contributors: 50% of Internet traffic now originates from just 150 Autonomous Systems (down from thousands just two years ago). More and more content is being aggregated through big players and content distribution networks. As a group, CDN’s account for approximately 10% of Internet traffic.
- Consolidation of Applications: The browser is increasingly running applications. HTTP and and Flash are the predominant protocols for application delivery. One of the most interesting findings from the presentation is that P2P traffic as a category is declining fairly rapidly. As a result of efforts by ISPs and others to rate-limit P2P traffic, in a strict “classifiable” sense (by port number), P2P traffic accounts for less than 1% of Internet traffic in 2009. However the actual number is likely closer to 18% when accounting for various obfuscation techniques. Still this is down significantly from estimates just a few years ago that 40-50% of Internet traffic consisted of P2P downloads. Today, with a number of sites providing both paid and advertiser-supported audio and video content, the fraction of users turning to P2P for their content is declining rapidly. Instead, streaming of audio and video over Flash/HTTP is one of the fastest growing application segments on the Internet.
- Evolution of Internet Core: Increasingly, content is being delivered directly from providers to consumers without going through traditional ISPs. Anecdotally, content providers such as Google, Microsoft, Yahoo!, etc. are peering directly with thousands of Autonomous Systems so that web content from these companies to consumers skips any intermediary tier-X ISPs in going from source to destination.
When ranking AS’s by the total amount of data either originated or transited, Google ranked third and Comcast 6th in 2009, meaning that for the first time, a non-ISP ranked in the top 10. Google accounts for 6% of Internet traffic, driven largely by YouTube videos.
Measurements are valuable in providing insight into what is happening in the network but also suggest interesting future directions. I outline a few of the potential implications below:
- Internet routing: with content providers taking on ever larger presence in the Internet topology, one important question is the resiliency of the Internet routing infrastructure. In the past, domains that wishes to remain resilient to individual link and router failures would “multi-home” by connecting to two or more ISPs. Content providers such as Google would similarly receive transit from multiple ISPs, typically at multiple points in the network. However, with an increasing fraction of Internet content and “critical” services provided by an ever-smaller number of Internet sites and with these content-providers directly peering with end customers rather than going through ISPs, there is the potential for reduced fault tolerance for the network as a whole. While it is now possible for clients to receive better quality of service with direct connections to content providers, a single failure or perhaps a small number of correlated failures can potentially have much more impact on the resiliency of network services.
- CDN architecture: The above trend can be even more worrisome if the cloud computing vision becomes reality and content providers begin to run on a small number of infrastructure providers. Companies such as Google and Amazon are already operating their own content distribution networks to some extent and clearly they and others will be significant players in future cloud hosting services. It will be interesting to consider the architectural challenges of a combined CDN and cloud hosting infrastructure.
- Video is king: with an increasing fraction of Internet traffic devoted to video, there is significant opportunity in improved video and audio codecs, caching, and perhaps the adaptation of peer-to-peer protocols for fixed infrastructure settings.