Handling Data in Enterprise Mashups
- Web services
- RSS Feeds
- Platform services in a cloud
- Data
- Client Application
In this blog, I will quickly touch upon on Data part of the mash-ups. Data in Enterprise Mashups can be in the form of:
- XML data residing in RSS feeds or in webservices.
- DB data
- Unstructured data
- JSON
For XML and JSON data, the parallel parsers can be used to create the Mash up. This could be multithreaded or use Multicore architecture of Intel chip at hardware level http://www.intel.com/cd/software/products/asmo-na/eng/406212.htm. On other hand we can use hadoop’s HDFS and MapReduce for un-structured data.
Hadoop is a framework based on java that supports distributed computing scale very well for data intensive applications. Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes http://wiki.apache.org/hadoop/. One good example of an enterprise mashup is “CRM-gadget” http://www.programmableweb.com/tag/enterprise , which searches new account or validate accounts on oracle on demand over Google local search. This mashup can tap the potential of Hadoop HDFS and Mapreduce and reduce the time to search the accounts.
To conclude, we need to build POCs and see the dynamic dissection/split of data on parallel/distributed nodes to achieve almost linear speed-up. This will in-turn reduce the total time of executing an Enterprise Mashup application.

