Realize business value from big data with Infosys data analytics solutions.

« Industrial Internet of Things (IIoT) - Conceptual Architecture | Main

Pragmatic Data Quality Approach for a Data Lake

Posted by Ketan Puri (View Profile | View All Posts) | October 30, 2016 4:27 AM

On 26th Oct 2016, we have presented our thought paper at the PPDM conference hosted in Calgary Telus Spark science center(Calgary Data Management Symposium, Tradeshow & AGM)

http://dl.ppdm.org/dl/1830

Abstract:

With the increase in amount of data produced from sensors, devices, interactions and transactions,
ensuring ongoing data quality is a significant task and concern for most E&P companies. As a result, most of the systems that are sources of data have deferred the task of data clean-up and quality improvement to the point of usage. Within the Big Data world, the concept of Data Lake which allows ingesting all type of data from source systems without worrying about the type or quality of data, further complicates the aspect of data quality as the data structure and usage is left to the consumer. Without a consistent governance framework and set of common rules for data quality, Data Lake may quickly end up into a Data Swamp. This paper examines the important aspects of data quality within Upstream Big Data context, and proposes a balanced approach for data quality assurance across data ingestion and data usage, to improve data confidence and readiness for downstream analytical efforts.
 

The key points/messages that we presented were,

1. Data quality is NOT about transforming or cleansing the data to fit into the perspectives...instead  it's about putting right perspective to the data....


2. Data by itself is not Good or Bad it's just data, pure in its most granular form


3. Quality is determined by the perspective through which we look at the same data


4. Architectural approach to abstract data from the perspectives or standards and build a layer of semantics to view the same data from different point of views. We don't need to populate data into models (PPDM, PODs etc.) instead we put models on top of the existing data promoting the paradigm of "ME and WE" where each consumer of the data has their view point of the same data. The concept of the WELL can be viewed in reference to Completion, Production, Exploration etc. without duplicating the data in the data lake.


5. Deliver quick value to the business and build their trust on the data in the data lake scenario


Please refer to the below link for the details

http://dl.ppdm.org/dl/1830

Comments

Nice article...

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed
Bloggers

Infosys on Twitter