Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Minimizing the Bull Whip Effect in the IT Supply Chain | Main | Is it possible to achieve "Zero" breakdown in Maintenance World? »

Data Federation – A potent substitute of Data Warehouse?

From my past experiences, I have observed that we often build a data warehouse as a way of integrating multiple sources of data to gain effective business intelligence. This is both time and resource consuming and also can potentially disrupt the IT roadmap of an organization if not handled with utmost maturity.

In this rapidly changing world of technology, new paradigms have evolved which have the possibility of simplifying the process of aggregation of multiple sources. One such technology which I found very exciting is Data Federation technology which is also known as Information-as-a-service, Data Virtualization or EII(Enterprise Information Integration).

Heart of this technology is a “virtual database” or a Federated Database as was defined by McLeod and Heimbigner long back in 1985. Simply speaking, a virtual database is storage of data definitions and not the data itself. The virtual database will have information about the location of the data.When a single call is made to a virtual database, the technology ensures multiple calls to underlying databases and is also responsible for meaningfully aggregating the returned result sets.

Primary benefit of the above approach is that data need not be moved from the source systems for analysis.It also saves the cost of building and maintaining a permanent warehouse.  Since data is not being moved, this enables quick and real time data delivery. 

The biggest challenge that needs to be handled for such a system to deliver what it promises is the heterogeneity of the DBMS giving rise to naming, schema, domain, model conflicts. These can be  typically handled by designing multiple stacked-up schemas which accurately translates
the data model,as visible to the user, to actual data models of the component DBMS.  

Areas where this technology will have ready acceptance are the organization’s divisions (like fraud detection units) which heavily rely on real time intelligence from disparate systems to drive business. Data Architects may also find this approach very efficient for maintaining master dimensions which are typically time consuming at an enterprise level.

Vendors like Oracle, SAS, Informatica etc have already lined up extremely comprehensive solutions in the market. Now it’s on the Consultant and Architect community to go out there and propose solutions which are truly ‘out of the box’!  

Last but not the least, what is the experts’ take on this?
Talking to some of the data warehousing connoisseurs, I felt people are divided on the appropriateness of this approach of replacing a data warehouse with federated architecture. Weighing in the pros and cons, I can safely conclude that at this point of industry maturity, this approach definitely merits an augmentation to a traditional data warehouse but we need to wait and watch how best it evolves to become the mainstream.

Comments

Very Nice and Informative article Dwaipayan.

Federated Architecture as I understand would be a perfect fit for sitations where the BI reports are not large in number and the data analsysis is not done on a frequent basis.

Does it involve and ETL layer like a typical datawarehouse or are we querying the data in the source systems directly.

Federated Database (FDB) is a great concept. But it has its own, special case - if not limited, sphere of application and usability. FDB is more advantageous and meaningful, both from use as well as building perspective, when the flavor of underlying data, from multiple systems, is same and the requirement is to access the data at real time. Example, if an organization is capturing Customer related data (same flavor of data) in multiple systems, globally, and the requirement is to access this data at real time; it makes sense to construct FDB. However, if the underlying databases, for Customer data – staying with the example, support different query languages or if data model of these constituent databases are different or if the cardinality of data is different etc, challenges to construct an efficient FDB start pilling up with each single difference.

Moreover, the moment requirement shifts from single flavor of data to disparate data flavors, FDB starts loosing its sheen. Global organizations, most of which have grown inorganically, have incongruent systems even for capturing and storing same data, leave alone dissimilar data. To integrate these systems and to present one picture of data, only conventional data warehouse have the capability to meet these requirements.

Heterogeneities in databases pose a potent problem to FDB. Single user query on FDB has to be translated into multiple queries (possibly in different query languages) against multiple constituent databases and then result sets have to be translated back in one presentable and consumable form. Building better FDB requires top-down approach, i.e., building the constituent systems/databases with intent to integrate them via FDB.

FDB has a special use and is a better fit only if the underlying systems/databases don’t pose many problems in associating with each other. FDB can not be a potent substitute for conventional data warehouse but they can be considered as alternative option.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles