Infosys delivers concept-to-market software engineering services across the engineering value chain. Our blog will discuss the latest trends in software product engineering, outsourcing, technologies, and address business challenges.

« Google File System | Main | Internationalization and Performance considerations »

Handling Data in Enterprise Mashups

Mashups are always ever-green, hence gets the attention from all the stakeholders, be it a creator of the mashup or the user of  the mashup. Thanks to Google Maps which has taken the popularity to next level. A Typical mashup application comprises of a web application that combines data or functionality from two or more external sources to create a new service. The term Mashup implies easy, fast integration, frequently using open APIs and data sources. An example of a mashup is the use of cartographic data to add location information to real estate data, thereby creating a new and distinct web API that was not originally provided by either source. These mashups have also got its foot into enterprise business and the termed coined is “Enterprise Mashups”. Here in addition to just data the process also comes into picture. If the enterprise is SOA enabled then we can directly use the BPM engine for process orchestration. Enterprise Mash up consists of:

 

  •  Web services

     

  •  RSS Feeds

     

  •  Platform services in a cloud 

     

  •  Data

     

  •  Client Application 

     

 

In this blog, I will quickly touch upon on Data part of the mash-ups. Data in Enterprise Mashups can be in the form of:

 

  • XML data residing in RSS feeds or in webservices.

     

  • DB data

     

  • Unstructured data

     

  • JSON

     

In Mashups the processing of data is a dynamic activity hence the time taken to process the data may increase the overall execution of the mashup application. To tackle this problem distributed computing can be applied on different kinds of data as mentioned above.

 

For XML and JSON data, the parallel parsers can be used to create the Mash up. This could be multithreaded or use Multicore architecture of Intel chip at hardware level http://www.intel.com/cd/software/products/asmo-na/eng/406212.htm. On other hand we can use hadoop’s HDFS and MapReduce for un-structured data.
Hadoop is a framework based on java that supports distributed computing scale very well for data intensive applications. Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes http://wiki.apache.org/hadoop/. One good example of an enterprise mashup is “CRM-gadget” http://www.programmableweb.com/tag/enterprise , which searches new account or validate accounts on oracle on demand over Google local search.  This mashup can tap the potential of Hadoop HDFS and Mapreduce and reduce the time to search the accounts. 

 

 To conclude, we need to build POCs and see the dynamic dissection/split of data on parallel/distributed nodes to achieve almost linear speed-up. This will in-turn reduce the total time of executing an Enterprise Mashup application.

 

 

TrackBack

TrackBack URL for this entry:
http://www.infosysblogs.com/apps/mt-tb.cgi/810

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.