BI Data Marts are like Icebergs in Ocean
You must be surprised to see something like this in the Blog. Well, we must have grown up thinking that Data Marts are the cure for all the ills plaguing the Big BI / DWs in the first place. I am not ruling out that thinking completely, but just want to bring in some more focus towards the existence and continuation of Data Marts.
We all know that data is the lifeline of BI / DW. In case of huge BI / DW installations, rummaging through data gets difficult, hence the concept of data marts was born. Traditionally, Data Marts provide proper availability and accessibility to the concerned users. Sometimes the departmental users also use analytics with the help of sandboxes. We need to keep in mind while devising the data marts, because the data marts have every potential to turn into data management challenges, if they are not monitored properly. The IT department has to do proper management of the same, and not letting them out of control in the hands of business users.
EDWs with ODSs are the bedrock of the organisations, leaning on BI heavily. Usually EDWs provide all the data and information needed by the business users, when they need it. As the EDWs mature, the users' needs also grow leading to more efforts on data re-organisation and how the information is serviced. Sometimes, this leads to demands of data marts from teh business uesrs.
My suggestion to use a Virtual data mart (DM), to start with, rather than providing a physically separate data mart. This should have a combination of views, only some permanent structures on important data or aggregates. These permanent structures can be permitted only to improve the performance. This actually leads to lesser duplication of data.
After talking about a Virtual DM, let us examine another 2 types of DMs. They are Dependent and Independent DMs. Dependent DM is the one that has data sourced from within the EDW. ndependent DM contains data that is usually not from the EDW.
Let us examine, What are the issues with Independent DM:
1. This is usually a quick and dirty work to start with, just to satisfy business user needs.
2. The future demands on the DM will be to increase additional data elements and other aggregate levels to improve performance.
3. Then the DM's data volume goes up quickly leading to data management challenges.
4. This also leads to challenges like data inconsistencies.
5. Ultimately, IT will be spending more time in reconciling the data, explaining the issues to the business users etc.
6. If the business users are not happy with it, then it will lead to death of the DM. Or else, if the DM is too large, then it will lead to multiple sets of truth.
Now, let us focus on the issues with Dependent DM:
1. This may offer a better solution compared to the Independent DM, as this involves increasing the footprint of the EDW to accommodate more data elements, and then service the Dependent DM.
2. More data elements in the EDW increases the complexity of the queries.
3. More care has to be taken in defining the Dependent DMs now, otherwise, illogical division of data (into multiple reas) to more complex queries, and data inconsistencies.
4. IT needs to have more business knowledge, for improved data organisation, master data management, and how the data is shared across multiple subject areas.
5. Some of the data items / structures have to remain independent or omnipresent to maintain consistency and avoid duplication.
6. IT and Business users have to work hand in hand to define the views etc., very intelligently.
7. Great thoughts have to be put in creating new DMs, as they have a tendency to hog the process window, and occupy the precious hard disk space.
8. Properly managed EDW and Dependent DMs will result in reduced time to market for new applications.
After understanding all these issues, we know very well, that Independent DMs are difficult to avoid at times. Hence the best way forward is proper planning. We should always aim to develop a good DM infrastructure to bring in maximum benefit and minimum data management challenges. There is another care we need to take in using the DMs, that is standardising the technologies around them. Where possible, we should aim for re-using the technology stack, otherwise, this will be lead to another complexity in multiplicity of tools.


