Does the raging ‘information explosion’ baffle you? Unravel the Enterprise Information Management (EIM) treasury for an assured return on information with a competitive advantage.

« November 2008 | Main | January 2009 »

December 17, 2008

BI Data Marts are like Icebergs in Ocean

You must be surprised to see something like this in the Blog.  Well, we must have grown up thinking that Data Marts are the cure for all the ills plaguing the Big BI / DWs in the first place.  I am not ruling out that thinking completely, but just want to bring in some more focus towards the existence and continuation of Data Marts.

We all know that data is the lifeline of BI / DW.  In case of huge BI / DW installations, rummaging through data gets difficult, hence the concept of data marts was born.  Traditionally, Data Marts provide proper availability and accessibility to the concerned users.  Sometimes the departmental users also use analytics with the help of sandboxes.  We need to keep in mind while devising the data marts, because the data marts have every potential to turn into data management challenges, if they are not monitored properly.  The IT department has to do proper management of the same, and not letting them out of control in the hands of business users.

EDWs with ODSs are the bedrock of the organisations, leaning on BI heavily.  Usually EDWs provide all the data and information needed by the business users, when they need it. As the EDWs mature, the users' needs also grow leading to more efforts on data re-organisation and how the information is serviced.  Sometimes, this leads to demands of data marts from teh business uesrs. 

My suggestion to use a Virtual data mart (DM), to start with, rather than providing a physically separate data mart.  This should have a combination of views, only some permanent structures on important data or aggregates.  These permanent structures can be permitted only to improve the performance.  This actually leads to lesser duplication of data.

After talking about a Virtual DM, let us examine another 2 types of DMs.  They are Dependent and Independent DMs.  Dependent DM is the one that has data sourced from within the EDW.  ndependent DM contains data that is usually not from the EDW.

Let us examine, What are the issues with Independent DM:
1. This is usually a quick and dirty work to start with, just to satisfy business user needs.
2. The future demands on the DM will be to increase additional data elements and other aggregate levels to improve performance.
3. Then the DM's data volume goes up quickly leading to data management challenges.
4. This also leads to challenges like data inconsistencies.
5. Ultimately, IT will be spending more time in reconciling the data, explaining the issues to the business users etc.
6. If the business users are not happy with it, then it will lead to death of the DM.  Or else, if the DM is too large, then it will lead to multiple sets of truth.

Now, let us focus on the issues with Dependent DM:
1. This may offer a better solution compared to the Independent DM, as this involves increasing the footprint of the EDW to accommodate more data elements, and then service the Dependent DM. 
2. More data elements in the EDW increases the complexity of the queries.
3. More care has to be taken in defining the Dependent DMs now, otherwise, illogical division of data (into multiple reas) to more complex queries, and data inconsistencies.
4. IT needs to have more business knowledge, for improved data organisation, master data management, and how the data is shared across multiple subject areas.
5. Some of the data items / structures have to remain independent or omnipresent to maintain consistency and avoid duplication.
6. IT and Business users have to work hand in hand to define the views etc., very intelligently.
7. Great thoughts have to be put in creating new DMs, as they have a tendency to hog the process window, and occupy the precious hard disk space.
8. Properly managed EDW and Dependent DMs will result in reduced time to market for new applications.

After understanding all these issues, we know very well, that Independent DMs are difficult to avoid at times.  Hence the best way forward is proper planning.  We should always aim to develop a good DM infrastructure to bring in maximum benefit and minimum data management challenges.  There is another care we need to take in using the DMs, that is standardising the technologies around them.  Where possible, we should aim for re-using the technology stack, otherwise, this will be lead to another complexity in multiplicity of tools.

December 15, 2008

Online Merchandising - Where "User Attention is taking over "Shelf Space"

Traditionally merchandising has always been prominent in brick & mortar retail stores where the retailer ensured that the he used a good combination of ‘in store’ displays and the arrangement of products for the purpose of promotion. In addition to this the retailer ensured that the premium shelf space, the space at the eye level of the consumer, was used to either push products which are high on inventory or those products which would be of interest to the consumer.

With the advent of e-commerce all the benefits of a store needed to be replicated in the online world. This included the experience of having the sales representatives helping each customer, the experience of having the in store displays and the other promotions of products.
Search technologies have played a key role towards this by providing a superior user experience thereby increasing the “stickiness” and traffic to the sites. The primary aim of the e-commerce sites has been to convert these Browsers into Buyers and subsequently to increase the order size of the buyers. Order Size of the buyers can be increased by effective promotions and item associations that would result in additional items getting added into the shopping basket of the user. A few ways through which this superior user experience can be met are -

Guided Access
This would enable the improvement in the visibility or presence of an item by guiding the user towards the same. This is extremely essential, as in the online world every space on the web page is a premium shelf space and the grabbing of the user’s attention holds the key. This would also help in guiding the user through his “attributes of concern” meaning a price conscious user could be guided through a set of price ranges, a brand conscious user can be guided via a set of navigators of different brands. Guided Access can also be achieved in different other ways such as mediated access where the user can define what exactly he wants by answering a set of questions and get only those results which would be most relevant to him/her.

Dynamic Content Association
Search technologies today is being looked at as more of a “Contextual Information Integrator” and this enables the contextual and dynamic content associations. It can be leveraged to build the item associations dynamically that are relevant to the Product life cycle and merchandising strategy

Promotions
The power of “relevancy analysis” can be leveraged to identify the best cross sell and up sell opportunities. The promotions can also be carried out aiming at pushing out the excess inventory.

Seamless User Experience across channels
Search could be leveraged to enable virtual content consolidation of content residing in multiple systems (integrating this information contextually) and empowering the user to access the information in a unified manner thereby providing a seamless user experience across channels.

User experience plays a huge role in ecommerce world and we see that search technology in its new avatar can make the difference between a good user experience to one that is great.

December 10, 2008

How PoCs are helpful for well-accepted solutions

I am sure all of you must have come across the words Proof of Concept, and Prototyping, in BI context.  You would have come across them quite early in a BI / DW Project, where the customer is testing technology waters, and how their business users would react.  We all know that, this kind of stuff engages business users early on.

All of us have seen that a usual problem in BI / DW development is the disconnect between what the business users need and what the IT delivers.  It is difficult to find fault with either of the teams, because Business users find it difficult to express or articulate their requirements, specially the look and feel of their needs, definition of their metrics, what kind of analyses they would like to have handy.

Our usual or traditional methodologies involve business users' engagement at the time of requirements gathering, and at the UAT (user acceptance testing) phases only.  We also must have experienced business users' unhappiness at these approaches.   The business users will have a variety of displeasures ranging from look & feel, arrangement of KPIs, Metrics, inappropriate drill-downs, not too familiar interfaces, difficult to use navigation, unusable delivery methods etc.  This is because the IT folks will bring in all new terminologies at the UAT phase, like dimensions, slowly changing dimensions, fact tables, aggregates, drill throughs, dashboards, scorecards etc.

The moment the business users are unhappy, you can be assured that the BI / DW solution will be lightly used.  This results in more complaints at IT folks, frequent change requests coming up.  All these result in cost overruns, unsatisfied business users, no proper ROI on the IT investment, and data silos still get created for the business users to work with.  Final result is that there is no data consistency.  Business users may shrugoff with waste of money comments.

Well, I belive that there is a way out from these scenarios.  It is easy to say that the business users should be involved all through the development.  Then there are issues in this also.  How often you heard, the IT folks complaining that the business users don't participate in these meetings.  So we need to excite them to participate in the BI project all through.  Well, all-through is something very difficult to achieve.  SOmetimes, insitence on their continuous attendance, may result in their juniors attending the meetings.  These guys may not have the complete context of the development, and may not contribute properly.  Besides, additional people and their time will also impact the BI Budgeting.  I don't think in these days of economic downturns, businesses can afford these.

Well, now can I say that PoC / Prototyping is the best foot forward.  Well, you may think that I could have started with the advantages of PoC straight away.  But giving some background always builds up the case.  You all can use this argument for a PoC / Prototyping, if required.

PoC offers a best means to excite end users' interest.  It showcases all functions and features, for the end users' to pick from.  This would excite them participate whole-heartedly atleast in the PoC validation.  This will make them have a dekko at the shape of things to come.  They will also be familiar with some of the new names they are going to hear, and usa of them. Hence, care has to be taken to show all types of features and functions in the PoC itself, even if it is done on a small scale.

Now, let us try to understand whom and all we should involve in the PoC validation.  Showcasing to the a small group of high end users will be good.  But it may not be enough, as all the end users may not be high end users.  So, a sample of heavy users, all across different levels of the organisation have to be involved for a better acceptance.  This will make all the opposition to the new technology expressed right from the beginning, rather than expressed at the end, resulting in sub-optimal usage later.  On the upside a successful PoC can generate more champions for the technology.  Now coming to the downside of the Poc: We can atleast know that there is some opposition to the technology, some people are not very comfortable with the features offered.  Even if the PoC is not accepted, then we wouldn't have burnt heavy bit of time and money on a technology which will anyways not get accepted well. 

Now let us look at the ways and means of getting a PoC signed-off.  Although a PoC doesn't reflect a final solution, but in the case of BI it does more or less represent a vertical slice of the whole solution.  When we say vertical slice, it includes each and every layer on the components and will have a representation of all the features and components.  Once, focus is put on the PoC development with understanding, there is every possibility that a PoC represents a full solution. 

Sometimes, PoCs get accepted quickly, other times they don't.  A well accepted PoC represents that a proper tool has been chosen right from the beginning, and it also matched the expectations from the business users.  But if it doesn't get accepted, then we need to think that a problem is nipped in bud.

PoC also brings the business and IT folks together and all their ideas get exchanged.  So, there will not be many ambiguities later. 

Some of us expect that PoCs should be leveraged and should be able to cut the overall development time.  Most of the times, we cannot re-use the PoC components.  But if you look at the reduction in the overall timelines, it is mainly due to the reduction in requirements gathering time, and business users' visualisation portion.  But this significantly reduces the rework efforts. 

Let me try giving some suggestions on the PoC: 
1. Tend to show the BI / Reporting tool, in full features and colours, rather than actual use for a particular user.
2. Keep handy the names of all the folks involved in the PoC, so that there is an immediate buy in from the business users.
3. For easier participation of the business users, we should encourage use of business terms.
4. For a proper acceptance, we should have something ready and working PoC, before even we get the business folks involved.
5. Show some current data for better involvement of the users.  Showing old data may not interest them, or make them look for correct information from it.
6. Set their expectations from the beginning, that the PoC is not going to answer all of their queries.  Also explain that increasing the scope of the PoC, is also going to delay things, but instead the users should be focussing on the features and functions, rather than coverage.
7. When the users begin to evaluate the PoC deeply, try to differentiate their suggestions into must do and wish to do.  So that the wish to do doesn't take up more time, and doesn't figure as a priority stuff in the later phases.

December 9, 2008

Agile BI in Retail Industry

In this article I am covering Retail Supermarket Industry's BI requirements.  These days, it is very common for this industry to offer Online Sales and Brick & Mortar Sales outlets. This is a high transaction volume with low margin industry.

This Industry has to put up with loads of data volumes, specially with the checkout basket data, constantly being revised products and their SKUs, stock availability data, commercial data to calculate margins, delivery van slot availability, product substitution data, customer acceptance data, customer services data, new customer master data etc.

At the same time, timely information should be available for the various departments like: Finance, Operations, Supply Chain, Marketing, Customer Services, Analytics, Commercial, and Business Development.

Information is also required at different tiers of the organisation like, Operational, Department and Corporate, mainly for running, managing and monitoring the business.  We can add one more important tier in the information pyramid, called Analytics.  This one layer works across the different departments and functions, brings in in-depth understanding on data.

Operational Reports can support operational activities, but can be mostly be supported by the various OLTP reports.  This leaves the reporting at Department level, Corporate and Analytics for the BI arena.

I think, the BI Arena can be classified under 3 categories as:
1. Business Development Area: Campaign management & Promotions fall into this category, customer segmentation, market place etc.,
2. Customer Engagement Area: Complaints, Returns, Refunds fall into this category etc.,
3. Business Reporting: Budgeting, Actuals Analysis, Transaction & Operations monitoring etc., fall into this category

These 3 areas can also be called as categorised as per relative importance.  The first 2 are more or less customer facing, whereas, the third one is internal facing.

Let us take a close look at the kind of data elements that we have to deal with in this Retail BI arena.  We can categorise these data types into Master Data, Transaction Data, Derived Data, and Analysis areas.

Most of the Master data is around Cutomer, Product, Merchandising, Vendor, Employee, Campaigns and promotions. 

Most of the Transactional data is around Sales Baskets, product pricing, discount coupons, returns, refunds, complaints, fulfilment, and inventory / stock.

Some of the Derived data is around customer loyalty, customer segmentation etc.

I think, we can safely call the following to fall under Analysis Areas: Segmentation, market basket analysis, returns analysis, failed deliveries, inventory analysis, fraud analysis, customer service, delivery options, basket & spend analysis, marketing, profitability analysis etc.  The first 2 categories are also covered in this paragraph.

Let us also look at the Business Reporting area.  Most of the retailers operate on periodical reports in this area for their business monitoring.  They use weekly / monthly periodic reports in this space.  It is observed that most of the efforts are spent towards preparation of these reports, albeit manually, at most of the Retail companies.

After understanding the requirement categories and data elements that are involved, let us take a look at the users' convenience in BI.  We can possibly, list these as follows:

1. Should provide standard reports in a scheduled manner
2. Should provide on-demand reports
3. Should provide self-servicing facility for specific user groups
4. Should also provide analysis areas / sandbox areas for specific analysis

Let us also look at some useful architectural guidelines that can help our cause.  They are as:
1. Provide an Integrated information platform that makes actionable information available, and decision making easier for the concerned information consumers
2. Reporting Platform that supports standard reports, ad hoc and analytical reports
3. Platform that can be extended to accommodate Dashboards and Balanced Scorecards, for senior management, if required
4. Provide customer segmentation and any other analytical data required by campaign management, web analytics and personalization applications
5. Provide a sandbox / playpen area for occassional analytics
6. Maintain historical data at the most granular level, so that it can be used for performing any kind of analytics

Meanwhile, let us look at some of the very important applications like Product Induction Applications and Commercial applications, which create new product offerings, help categorise them, and provide cost price information. Retail supermarket chains keep adding / amending new products almost on a daily basis, and also experience cost price fluctuations (albeit in a narrow range) from their vendors / suppliers.

Let us take a look at the last 2 BI categories (before we go on to category 1 for more analysis) namely, Customer Servicing and Business Reporting.  Actually, I feel that both the Customer servicing & Business Reporting fall in the same category of reporting. Now I am not comparing them on importance, but the data management and data arrangement portions only.  Both these BI Categories need information to their respective users on the following items (not an exhaustive list):

Finance figures on: Sales, Margin, Gross margin, contribution, Payroll, Orders, basket size, promotional sales on various dimensions like Target, Actuals, comparison with previous few periods etc.

Customer figures on: Number of new customers added, complaints, refunds, returns, substitution acceptances, product availability, active customers on various dimensions like Target, Actuals, comparison with previous few periods etc.

Fulfilment productivity figures on vans, pickers, pick rates, missing items, items per order on various dimensions like Target, Actuals, comparison with previous few periods etc.

There will be more interesting items such as: System availability, Advertising revenues, stores utilisations, Basket size analysis, average items per basket, product availability, refund summaries, returns summary, substitution acceptance, website statistics, discount coupons summary, Fraud details etc.

Usually, these info on these items is provided to the business users across hierarchies, on a periodical basis.  We also need to understand the fact the business users' information needs keep changing quite frequently, and sometimes they also need more information that what is provided to them through standard reports.  Sometimes, they also demand information on a more frequent basis.  This kind of information availability is easy through proper data management and its arrangement rather than creating complex reports.  Creating an aggregated and dimensional layer will be useful for the business users information needs. Using a ROLAP environment is going to be more useful compared to the MOLAP environment, as it involves too frequent cube refreshes.  Cube refreshes take more process window, and also necessitates historical data storage and processing.

Now, let us take a look at the first BI category "Business Development Area".  This covers items like Campaign management & Promotions, customer segmentations, market basket analysis, etc.  I recommend creation of some bridge tables to the already existing master data dimensions to indicate these analytical divisions.  These bridge tables also prevent frequent updation of the original master data dimensions.  These could also help in understanding the history of changes done to the master data dimensions. Analysis areas need very thorough and expert hands to dissect, analyse and understand data.  Usually, the methods used in these types of analyses keeps changing over a short period of time. Creation of data marts etc., to perform these analysis is usually not recommended, as they will use-up lot of memory and hard disk space.  Usage of Sandbox / playpen areas to perform these analysis is recommended.  These sandboxes / playpens can have minimal persistent data.  These kind of analyses usually require the most granular data.  Hence, historical data is also kept at the most granular level.

Considering handling loads of data on a regular basis, it is always recommended to use BI Appliances.  These appliances use brute force to handle loads of data to provide useful information quickly.

December 2, 2008

ILM in Healthcare and LifeScience - Part 2

Posted by Rajiv Sabharwal, Chief Solutions Architect, HCLS

So here we are again... First of all let me thank those who provide me with very valuable feedback on my previous post. It was highly appreciated as it was my first attempt at blogging. It is difficult to teach an old dog, new tricks...

Now to business. As promised in my last post, today I will talk a bit about the ILM imperatives in the provider space. Providers being the entities and/or personnel that provide care giving services, such as hospitals, labs, physicians, nurses, hospices etc. Healthcare providers in US (for that matter pretty much everywhere) are a varied and highly disjointed lot... no surprises there. There are many disparate systems that contain patient's medical information and obviously they do not talk to each other. Infact till very recently there wasn't even a common standard for data exchange. For that matter even now, though there is an attempt at a few standards, none of them are universally accepted/adopted.

The most popular standard for exchanging of data between two disparate healthcare systems is known as HL7 (HelathLevel 7), which is currently in its v3.0 adoption (XML based vs previous versions that were pipe-delimited flat files and trust me, a pain to read). There are few other standards that take one or other mutation of HL7 and build upon it, such as IHE or CCD (Continuity of Care Document). In addition to the data exchange, the data content itself may be codified drastically different in different systems. For example there are disease and procedure coding schemes such as ICD (International Classification of Disease), CPT (Current Procedure Terminology) etc.

So now one is talking about having a patient's medical data in different systems (internal or external to a given provider) who each could possibly codify the data in a different scheme and who each could decide to use or not use a common platform to exchange the data. Got the context... No wonder a patient who has had a CT scan a day before goes thru another one next day because the refered physician does not have access to the report from the previous day's scan. Talk about wastage of resources (and obviously increased healthcare costs) not to mention the access radiation going thru the patient's body which could have easily been avoided.

So the response in the industry has been the clamor for an interoperability platform (a Health Exchange, usually called HIE for Health Information Exchange) that is primarily a glorified ILM system with a few additional bells and whistles. Core to such a platform is a clinical repository (though federated approach is also equally popular) that contains the most significant aspects of each patient's clinical data, extracted in batch from the source transaction systems. Obviously one can not hope to put every person's complete medical history in one single repository (after all we are talking about 300 million+ people in US and many of them quite sick. We dont even start talking about the Billion+ size of India or China). So what does one do? One option is to create a hierarchical repository structure, a la domain services provided by the erstwhile Network Solutions.

In a hierarchical structure (the kind the US' NHIN, National Health Information Network, is attempting), you create distinct repositories for let's say each community, which in turn get connected at a regional level, which in turn lead to state level network, which finally get into national network. Complex, Aye! (My attempt at Canadian vernacular). This is a task  easier said than done. One has to categorize data from the perspective of what can be and what should be stored at what level, how to link a patient across multiple systems each of which could (and usually do) have a distinct patient id. Add to it the regulatory mandates regarding patient's health information (PHI) and you have a royal mess on your hands. The situation leads to some very clear mandatory requirements if one wants to attempt healthcare data corelation and management:

  1. MPI - Master Patient Index -> A probabilistic algorithm that can accept demographic information of the patient as input, match it against similar information in other networked systems, and generate a common patient id that spans across all systems. The MPI algorithms are usually associated with services to define probabilistic thresholds (all records above the upper threshold are definite match, all records below lower threshold are definitely not a match, and all records in between the thresholds require additional intervention to decide whether the are a match or not
  2. RLS - Record Locator Service -> An identification logic that given the MPI as input, identifies all the source systems that contain any medical data for the patient and passes the information to the EAI component (or a low-foot-print proxy server, sitting on the source-system side), to retrieve the required data
  3. Consent Management -> An encapsulated component that allows the patient to maintain permission for who can see his/her medical data, what data is available to be viewed and under what condition.
  4. EAI / Proxy Server Component -> Collection of components that given the patient id in the source system by the RLS component, retrieve the required data from the source system, convert it from its source terminology (coding scheme) to the common terminology used in the clinical data repository, convert the data into HL7 (or whatever common standard is being used) message, and then store-and-forward the message according to guaranteed delivery principals.
  5. Presentation portal -> Distinct portals for patient, physician, and administrators that provide services such as longitudinal record (complete medical history of patient) presentation to physician under care-giving situation, Clinical Decision Support (CDSS) considerations, secured communication between physician and patient, Refill reminders, remote consultation etc.

I end with a generic architecture for a federated HIE. Any questions or comments are welcome. See ya till we talk again.

http://infosysblogs.com/eim/HIE_Federated3.html

Subscribe to this blog's feed

Follow us on

Blogger Profiles

Infosys on Twitter