Challenges in a Data Quality Implementation
A real-world data quality based MDM implementation presents some unique challenges that need to be addressed to ensure a successful implementation.
From my experience, the following challenges merit a thought during the design stage of the project itself to ensure that there are no unpleasant surprises in the latter stages of the project.
1. A Data Quality implementation by itself doesn't ensure the quality of the data can be improved. One of the huge misconceptions regarding a data quality implementation is that final data set will be a highly improved version when subjected to an entire suite of Data Quality solutions- Data Profiling, Standardization, Matching and Enrichment.
While implementing these solutions to enhance the data, the cardinal rule about data still prevails- "Junk in means Junk Out". Transactional systems are plagued with the problem of incomplete, missing or bad data. As an example mandatory fields may have junk values (like all zeroes, special characters etc) that have no correlation with the actual data but have been written merely to save the record in question. A Data Quality implementation cannot solve such problems of inherently 'bad' data, a problem which merits a different solution.
In this regard, any data quality implementation must have a data remediation project in parallel. This is to ensure that the benefits of a data quality based MDM implementation can be fully realized.
The challenge in this regard will be the fact any data remediation project will most certainly be led by a different team (predominantly business led team). Building synergies between the MDM implementation team and the data remediation team helps to nip the problem at the bud. One of the golden rule, is to prevent data issues, upstream in the process, so that data remediation is minimal in the later part of the data life-cycle chain.
2. Parameters used to flag duplicates need to reflect the ground reality. In theory the parameters like Names, SSN, TIN and address fields are the important fields on the basis of which duplicates can be identified. These parameters may not be enough. It may be possible that duplicates flagged on the basis of these parameters may not be duplicates in the eyes of the business. Business may insist to keep two seemingly duplicate records as separate records since its makes business sense (e.g. the Account is large enough to merit more than one relationship manager, each of whom may wish to keep a record in this account in the transaction system to track opportunities etc.). In such a case, it is the responsibility of the MDM practitioner to educate the business that such a distinction can be maintained in a CRM application, however doing so in an MDM application is against the cardinal principles of master data management.
All such scenarios must be identified in the design stage itself as far as possible.
3. Data Quality project cannot be implemented in isolation. Several data quality features like cleansing, merging and enrichment have the potential to create chaos in the downstream systems if they are not geared up to meet these changes.
4. It is difficult to articulate completely the results of a Data Quality implementation particularly for data matching on large volumes of data. Also data matching rules cannot be flawlessly written during the design stage without the actual "feel" of the data. This presents a risk to the success of entire implementation. It is therefore prudent to factor in at least three full data loads so that the matching rules can be tweaked to best cater to the business requirements.
A successful data quality implementation will include customer specific solutions to above challenges. Without this any data quality initiative is unlikely to be as successful as it was initially perceived to be.



