Product Data Integration Challenge: Structured Vs Unstructured Data
The two key components of Master Data Management (MDM) are Product Data Management and Customer Data Management. The customer data consists of mostly structured data like Name, Address etc., whereas product data is highly unstructured. Product Data will have unstructured data like CAD Drawings, Specification Sheets, Images etc. While creating a product master data in a MDM system – you need to migrate product information from multiple disparate systems into your MDM system. Integration of unstructured product data during migration throws lot of challenges.
Usually the data needs to migrated into the master are maintained in different unconnected sources. Each department maintains and updated data relevant for them & fails to update the data not maintained by them. This often results in disparate data between systems.
In recent times, companies realize the need for a centralized blended record that acts as a single source of truth for their customers and products to improve their profitability and enable cross selling. This need is addressed by Master Data Management (MDM) tools. But the problem still lies in the integration of data from different legacy systems into one common MDM Data Hub. There is a need to check the quality of data from disparate systems, eliminate duplicates and blend the data to have a clean data that acts as the single source of truth and complete in all aspects. Also companies need to integrate record volumes of data in the shortest possible time.
The traditional tools available for Data Integration (DI) – for mass volume & complex integration scenarios, and Data Quality (DQ) will work for customer data & not for product data. The reasons being; product data is not as predictable as customer data and product data is loaded with highly unstructured data. Some of the product specific data could be mentioned in their own jargon which cannot be understood by others. Also most of product MDM tools use ‘pattern’ based recognition of Inbound data, which again will be useful only for structured data.
The solution for handling unstructured data is addressed by ‘semantic’ recognition of Inbound data (i.e.) the tool should focus on the meaning not the patterns. The system should understand the variations in word-order, punctuation, spelling and character-level parsing. Also the system should continuously learn on the fly to develop its intelligence as it migrates more and more data.
One such system that is recommended by Oracle for ensuring Data Quality of Inbound data into its Product Information System is, Silvercreek’s DataLensTM system. It can map incoming product data into internal product catalog, using ‘natural language processing’ to automatically categorize products and their attributes. For companies with huge product catalogs, such as ecommerce, manufacturing, retail, CPG etc., automation of Product Data integration enables users to process thousands of records of product data with accuracy and minimal human intervention.