Data profiling is all about CCC- AID? (Part 1)
By Jairaj Asok Kumar
Catch phrase or some gimmick! Well, the easiest way to pen your thought in concepts around Master Data Management is to look around and see what problem you are facing and evaluate if the problem is repeatable in nature. Every data management project I have been involved in, the key problem that persists is around bad data quality. Most clients consider that data quality in their existing system is correct, right and true. So how do you define correct, right and true? This is where the term CCC-AID comes in. Read on.
CCC-AID is a quick acronym for the following six criterion that needs to be applied on data to access the data quality i.e. Complete, Conformance, Consistent, Accurate, Integral and Duplicate. Each of this refers to a block that is performed during a data profiling exercise. 
When undertaking a data profiling exercise of the source systems, we could always use manual tools or automated tools. Typically in huge business transformation program or MDM enablement program, the toolset for data profiling is procured along with the MDM tool set selection. I.e. if a client is predominantly on Oracle stack, then the data profiling tool could be Oracle Data Quality - Profiling server. If the client uses an IBM stack, then the Infosphere Information Server- Information Analyzer toolset is ideal. However in a case when data profiling is to be done without having identified or procured an enterprise grade MDM tool, one has to opt for manual process. This is when it caught my fancy; why not utilize the open source route. I have always being inspired by open source tool, and one of the key tools that bears mentioning is the open source Talend ™ open profiler tool.
So, in the next part, we will try applying the above data principles onto a sample set of data using Talend ™ open profiler tool.



