Infosys delivers concept-to-market software engineering services across the engineering value chain. Our blog will discuss the latest trends in software product engineering, outsourcing, technologies, and address business challenges.

« MapReduce | Main | SaaS Implementation - Choosing the right SaaS maturity Level »

MapReduce in the Cloud

In one of the previous blogs (MapReduce), I discussed MapReduce framework, which can be used for efficient processing of  large data. Let’s assume that you have a need to process web scale data and you need a cluster of machines with MapReduce library installed on them. You don’t have to invest in buying a new set of hardware and setting up a cluster. Amazon’s Elastic Compute Cloud (EC2), supports Hadoop’s MapReduce implementation, as Amazon Elastic MapReduce web service (ElasticMapReduceDeveloerGuide). Like all other Amazon Web Services, this service is available on a pay-as-you-go basis. In this case Amazon’s Simple Storage Service (S3) is used as a data store instead of HDFS. Though HDFS is the default file system supported by Hadoop, it also provides interfaces to various other file systems including Amazon’s S3. To get started, you need to load your data and Mapper, Reducer executables into S3 and request Amazon Elastic MapReduce to start a new job flow. Elastic MapReduce starts a new EC2 cluster, which runs MapReduce on the data set uploaded by you and stores the resulting data in S3.

An open source implementation of MapReduce for Amazon Cloud OS, named as “Cloud MapReduce” is also available. You can explore Cloud MapReduce at: cloudmapreduce.

As of now, Microsoft Azure does not provide an Amazon Elastic MapReduce equivalent service for parallel processing. However Microsoft is working on a research project named DryadLINQ, which is a programming environment for writing large-scale data parallel applications on PC clusters. Using DryadLinq framework, developers can write MapReduce kind of applications on .NET. DryadLinq framework is expected to be available on Azure in future.

For Google’ App Engine (Google’s cloud computing offering), I came across an implementation of MapReduce know as HTTPMR. However, to use this, your computing environment should meet certain assumptions mentioned here.

With Amazon leading the way, you could see many of the cloud service providers offering mapreduce implementations as a service. This will help researchers, academics, small and medium enterprises in processing vast amounts of data efficiently and cost-effectively.

TrackBack

TrackBack URL for this entry:
http://www.infosysblogs.com/apps/mt-tb.cgi/817

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.