Does the raging ‘information explosion’ baffle you? Unravel the Enterprise Information Management (EIM) treasury for an assured return on information with a competitive advantage.

« Online Merchandising - Where "User Attention is taking over "Shelf Space" | Main | The 3 C’s of EIM »

BI Data Marts are like Icebergs in Ocean

You must be surprised to see something like this in the Blog.  Well, we must have grown up thinking that Data Marts are the cure for all the ills plaguing the Big BI / DWs in the first place.  I am not ruling out that thinking completely, but just want to bring in some more focus towards the existence and continuation of Data Marts.

We all know that data is the lifeline of BI / DW.  In case of huge BI / DW installations, rummaging through data gets difficult, hence the concept of data marts was born.  Traditionally, Data Marts provide proper availability and accessibility to the concerned users.  Sometimes the departmental users also use analytics with the help of sandboxes.  We need to keep in mind while devising the data marts, because the data marts have every potential to turn into data management challenges, if they are not monitored properly.  The IT department has to do proper management of the same, and not letting them out of control in the hands of business users.

EDWs with ODSs are the bedrock of the organisations, leaning on BI heavily.  Usually EDWs provide all the data and information needed by the business users, when they need it. As the EDWs mature, the users' needs also grow leading to more efforts on data re-organisation and how the information is serviced.  Sometimes, this leads to demands of data marts from teh business uesrs. 

My suggestion to use a Virtual data mart (DM), to start with, rather than providing a physically separate data mart.  This should have a combination of views, only some permanent structures on important data or aggregates.  These permanent structures can be permitted only to improve the performance.  This actually leads to lesser duplication of data.

After talking about a Virtual DM, let us examine another 2 types of DMs.  They are Dependent and Independent DMs.  Dependent DM is the one that has data sourced from within the EDW.  ndependent DM contains data that is usually not from the EDW.

Let us examine, What are the issues with Independent DM:
1. This is usually a quick and dirty work to start with, just to satisfy business user needs.
2. The future demands on the DM will be to increase additional data elements and other aggregate levels to improve performance.
3. Then the DM's data volume goes up quickly leading to data management challenges.
4. This also leads to challenges like data inconsistencies.
5. Ultimately, IT will be spending more time in reconciling the data, explaining the issues to the business users etc.
6. If the business users are not happy with it, then it will lead to death of the DM.  Or else, if the DM is too large, then it will lead to multiple sets of truth.

Now, let us focus on the issues with Dependent DM:
1. This may offer a better solution compared to the Independent DM, as this involves increasing the footprint of the EDW to accommodate more data elements, and then service the Dependent DM. 
2. More data elements in the EDW increases the complexity of the queries.
3. More care has to be taken in defining the Dependent DMs now, otherwise, illogical division of data (into multiple reas) to more complex queries, and data inconsistencies.
4. IT needs to have more business knowledge, for improved data organisation, master data management, and how the data is shared across multiple subject areas.
5. Some of the data items / structures have to remain independent or omnipresent to maintain consistency and avoid duplication.
6. IT and Business users have to work hand in hand to define the views etc., very intelligently.
7. Great thoughts have to be put in creating new DMs, as they have a tendency to hog the process window, and occupy the precious hard disk space.
8. Properly managed EDW and Dependent DMs will result in reduced time to market for new applications.

After understanding all these issues, we know very well, that Independent DMs are difficult to avoid at times.  Hence the best way forward is proper planning.  We should always aim to develop a good DM infrastructure to bring in maximum benefit and minimum data management challenges.  There is another care we need to take in using the DMs, that is standardising the technologies around them.  Where possible, we should aim for re-using the technology stack, otherwise, this will be lead to another complexity in multiplicity of tools.

TrackBack

TrackBack URL for this entry:
http://www.infosysblogs.com/apps/mt-tb.cgi/893

Comments

It's true that EDW could be the best solution, but implementing BI with a top down approach (EDW First) has a pragmatic limitation. Problem with the EDW approach is time-to-market.
from KN:
Hi Amit, thanks for your comments. Usually, the EDW will be planned for the whole, but the execution happens on a step by step basis. Well the golden rule says that: EDWs are very long gestation and high expenditure programs, hence they have to be planned with complete top-down manner, to ensure participation from teh top management and the business users. But to ensure that the users get what they want on a continuous basis, EDW is executed on a step by step (without missing all the logical and dependent pieces).

Ever thought why EDW at all? What if we can retrieve information from the source system direct?

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles

Infosys on Twitter