Using the Sum of the Parts to Power Innovation
(Top) Jeffrey Dean, Sanjay Ghemawat(bottom) - The recipients of the 2012 ACM-Infosys Foundation Award in the Computing Sciences
Google had a challenge. Granted, it's one most companies would like to have: So many people were using its search engine that Google was trying to keep up with the skyrocketing demand. When they found they couldn't deploy machines fast enough to handle the unprecedented amounts of data driven by their popular service, Google pursued software solutions to what initially appeared to be hardware problems.
Enter Jeffrey Dean and Sanjay Ghemawat. These Google developers recently won the ACM-Infosys Foundation Award in the Computing Sciences for designing a significant part of Google's revolutionary software infrastructure. The work of these two Google Fellows resulted in the foundations of Google's Web search and indexing platforms as well as numerous Google applications. Indeed, their accomplishments have helped unleash the potential of both big data and Cloud computing.
In fact, Google's predicament was the catalyst that put Dean and Ghemawat on the road to receiving the recent Infosys accolade. Faced with the enormous popularity of Google's search engine, and the potential strain it could put on the company's system, Dean and Ghemawat began thinking about the issue in terms of scalability.
Just as an assembly line divides the enormous, complicated task of putting together a large piece of machinery into hundreds of smaller tasks, Dean and Ghemawat split the work of a huge computer into small pieces and spread them across thousands of machines. Doing so allowed them to scale numerous computers. Better still was the fact that they could hide from programmers the complexities of managing enormous clusters of computers.
Big Data used to be, well, way too big. Just as it took workers in the days before the cotton gin many hours to separate seeds from cotton and seamstresses just as long to spin the cotton into thread, so, too, was Big Data an expensive and fairly elusive proposition for most corporations. Not many small-to mid-sized firms could afford to go out and buy a computer system that had the power to capture and process the raw data they desired to perform, say, market research.
Cloud computing is similar to this concept of de-centralization. But is also turns other concepts on their heads. Because computing power is provided as a utility to consumers free of hardware and implementation details, they can access enormous amounts of power that they wouldn't have been able to afford even a few years ago. So programmers with rudimentary experience can set up systems like the big boys.
Infosys clearly recognized the genius of these two developers when we decided to award them with their most recent Foundation prize. Today, the Google File System is a highly scalable system allowing huge files to be distributed efficiently across thousands of servers. Dean and Ghemawat also developed a programming tool called MapReduce, enabling developers to process large amounts of data sets with machines working in tandem. Infoscions might recognize MapReduce's open source implementation as Hadoop, which we've embraced as a company. MapReduce is easily programmed and takes advantage of cheap server power.
MapReduce is driving new Cloud-based solutions. So the next time you're wondering why you can access a world of data from your tiny laptop, thank these winners of the Infosys Foundation's most recent award. They tamed Big Data by making it practical, inexpensive, and scalable.