Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Retail Order Management - Emerging Trends | Main | Real Time Business Intelligence »

Product Data Integration Challenge: Structured Vs Unstructured Data

The two key components of Master Data Management (MDM) are Product Data Management and Customer Data Management.  The customer data consists of mostly structured data like Name, Address etc., whereas product data is highly unstructured. Product Data will have unstructured data like CAD Drawings, Specification Sheets, Images etc.  While creating a product master data in a MDM system – you need to migrate product information from multiple disparate systems into your MDM system. Integration of unstructured product data during migration throws lot of challenges.

Usually the data needs to migrated into the master are maintained in different unconnected sources. Each department maintains and updated data relevant for them & fails to update the data not maintained by them. This often results in disparate data between systems.

In recent times, companies realize the need for a centralized blended record that acts as a single source of truth for their customers and products to improve their profitability and enable cross selling. This need is addressed by Master Data Management (MDM) tools. But the problem still lies in the integration of data from different legacy systems into one common MDM Data Hub. There is a need to check the quality of data from disparate systems, eliminate duplicates and blend the data to have a clean data that acts as the single source of truth and complete in all aspects. Also companies need to integrate record volumes of data in the shortest possible time.

 The traditional tools available for Data Integration (DI) – for mass volume & complex integration scenarios, and Data Quality (DQ) will work for customer data & not for product data. The reasons being; product data is not as predictable as customer data and product data is loaded with highly unstructured data. Some of the product specific data could be mentioned in their own jargon which cannot be understood by others. Also most of product MDM tools use ‘pattern’ based recognition of Inbound data, which again will be useful only for structured data.

The solution for handling unstructured data is addressed by ‘semantic’ recognition of Inbound data (i.e.) the tool should focus on the meaning not the patterns. The system should understand the variations in word-order, punctuation, spelling and character-level parsing. Also the system should continuously learn on the fly to develop its intelligence as it migrates more and more data.

One such system that is recommended by Oracle for ensuring Data Quality of Inbound data into its Product Information System is, Silvercreek’s DataLensTM system. It can map incoming product data into internal product catalog, using ‘natural language processing’ to automatically categorize products and their attributes. For companies with huge product catalogs, such as ecommerce, manufacturing, retail, CPG etc., automation of Product Data integration enables users to process thousands of records of product data with accuracy and minimal human intervention.


I do not fully agree with the article. The product/catalog attributes are structured.

As the author mentioned, there is unstructured data also.

There are a lot of PIM products available in the market. Even Oracle PIM is capable of acting as a single source of truth.

Silver Creek DataLens is actually used for the cleansing purpose, which ultimately is an integral part of the PIM solution.

Hi Manoj, this blog is only about the unstructured data part of Product MDM and the issues with migrating data into the PIM system. Regular PIM solutions can also address the cleansing part, but what Silver Creek uses is a more advanced semantic technology. That is why all major PIM vendors including Oracle and IBM have a tie up with them (Nowhere have I mentioned that Silver Creek will act as a 'Single Source of Truth').

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles