If CRM has been a struggle or a passion for you then Infosys’ CRM blogs is the place to be in. Come join us as we discuss the latest trends, innovations and happenings which will have a bearing on CRM.

« September 2011 | Main | November 2011 »

October 25, 2011

Data profiling is all about CCC- AID? (Part 2)

In this part, we intend to apply the data principles discussed in the previous part onto a sample set of data using Talend™ open profiler tool.

The tool is very simple to install and use, installation is pretty much simple and once installed the eclipse work bench look and feel with cheat sheets allows any person to quickly learn the trick and trade of profiling. A graphic representation of the Talend ™ open profiler.

Jairaj02.jpg

As shown above, there are pre-defined profiling rules that can be executed against potential data sources, test files (comma separated delimited files) and or existing MDM applications. The data profiling rule can be run and the graphic look and feel of the result is of tremendous value.  The CCC-AID criteria are visually depicted to play back the inference with the business community. As shown below, the empty field values are highlighted, indicating a deficiency in the data acquisition process.
Jairaj03.jpgI am eager for the blogging community to provide any insight. This blog is by no means an endorsement of any tool set, but aims to address pain points of an MDM consultant.
I hope you liked this blog and please feel free to drop me a note in case any of the above material is useful or in case you need any support from us. I am currently working on my next blog which is around Data Modeling in MDM, till then adios.

October 20, 2011

Data profiling is all about CCC- AID? (Part 1)

By Jairaj Asok Kumar

Catch phrase or some gimmick! Well, the easiest way to pen your thought in concepts around Master Data Management is to look around and see what problem you are facing and evaluate if the problem is repeatable in nature. Every data management project I have been involved in, the key problem that persists is around bad data quality. Most clients consider that data quality in their existing system is correct, right and true. So how do you define correct, right and true? This is where the term CCC-AID comes in. Read on.

CCC-AID is a quick acronym for the following six criterion that needs to be applied on data to access the data quality i.e. Complete, Conformance, Consistent, Accurate, Integral and Duplicate. Each of this refers to a block that is performed during a data profiling exercise.
Jairaj01a.jpg

When undertaking a data profiling exercise of the source systems, we could always use manual tools or automated tools. Typically in huge business transformation program or MDM enablement program, the toolset for data profiling is procured along with the MDM tool set selection. I.e. if a client is predominantly on Oracle stack, then the data profiling tool could be Oracle Data Quality - Profiling server. If the client uses an IBM stack, then the Infosphere Information Server- Information Analyzer toolset is ideal. However in a case when data profiling is to be done without having identified or procured an enterprise grade MDM tool, one has to opt for manual process. This is when it caught my fancy; why not utilize the open source route.  I have always being inspired by open source tool, and one of the key tools that bears mentioning is the open source Talend ™  open profiler tool. 

So, in the next part, we will try applying the above data principles onto a sample set of data using Talend ™ open profiler tool.

Subscribe to this blog's feed

Follow us on

Blogger Profiles

Survey



Infosys on Twitter