Infosys experts share their views on how digital is significantly impacting enterprises and consumers by redefining experiences, simplifying processes and pushing collaborative innovation to new levels

« Differential Privacy - A milestone in data privacy | Main

Data Lineage

1.    Data Lineage - A quick Overview

"Data Lineage is defined as the life cycle of data which includes the data's origin, various phases, transformations, characteristics, and quality. It is the journey of the data from the time of its origin (data sources), followed by the path or system it travels through, what happens to the data when it moves over time (various transformations of the data), to its destination. Data Lineage helps us to analyze the overall purpose of our data".

View image

                              Figure 1: Data Lineage

Privacy regulations like GDPR enforce enterprises to track and organize any personal data. Data classification, protecting PII (Personally identifiable information) and Data lineage plays a vital role in protecting the organization's privacy requirements.

1.1  Types of Data Lineage

·         Backward Data Lineage - It means backtracking the data from its end-use, to its source.

·         Forward Data Lineage - It means tracing the data from current point of source, intermediate data flow, and follows through the destination.

·         End to End Data Lineage - It is the combination of both, looking at the complete flow of data from its source, intermediate data flow points and follows through the endpoints.

2.    Why is Data Lineage?

In the modern era data grows at an unpredictable rate and companies play around with lot of data, but rarely keep track of what is happening with their data. Sometimes we might have come across news like so and so the company has billions of records left unprotected on their servers. It puts organizations at risk for violating data privacy and GDPR. So here, Data Lineage plays a vital role in the General Data Protection Regulation (GDPR).

With data lineage companies are able to map the relationships between data (between hundreds and thousands of databases) in no time, which done manually, might require more than a full year's work from a single developer. Be it a developer or tester or business analyst, everyone deals with a lot of data in day to day work which has to be documented properly.

Data lineage gives us a clear picture of the data flowing, from the time, that it is generated, how it is transformed, and how it travels across and outside an organization.

1.    How to achieve Data Lineage
  • Identify and document the 5W's of data Lineage - Data becomes meaningful only when we know the history and authenticity of the data. Knowing your data is important to reduce many business complexities and documenting it would result in efficient way to implement many solutions. So it is important to document the multidimensional features about your data which would be covered in the below 5 W's.

1.      Who is using the data?

Identify the data owners, data elements and subjects. Keep track of the organizations, processors, vendors who consume the data.

2.      What does it mean?

Identify what does the data mean to the users, products, processes and services interacting with it.

3.      Where does it exists?

Identify all the data points where your data exists or through which your data flows.

4.      When was it captured?

Identify the data's origin, movement and transformation over time.

5.      How/Why your data is being used?

Identify how your data is being used by products, organizations, vendors and Reports. Understand Relationships between the data, how it is connected with any user or application or process.  It helps in root cause analysis.

 View image

       

Figure 2: The 5 W's of Data Lineage

·         Automate the process - Automation is another crucial element of Data Lineage process. It is very essential to have an automated system to process the data and gives us the metadata of data lineage which provides valuable insights.

4.  Benefits of Data Lineage

  • Data governance
  • Data Quality Assessment(DQA)
  • Data Protection & GDPR compliance
  • Root cause analysis
  • Business Intelligence & analysis
  • Data Analytics.

5. Data Lineage Use Cases

·From Tracing Viral Origin to Data Tracing - key to success

Just think of the recent pandemic COVID-19, we have come across lot of speculations about the origin of the coronavirus. Various people across the globe like Scientists, Research professionals, health care professionals, Media, government or normal people provides various speculations about the virus's origin, its transmission, its various forms when transmitted and its various sources. It's all about the lineage. Just like the origin and contact tracing plays a significant role in preventing viral transmission, similarly tracking the data origin and its transmission and organizing the data plays a vital role in managing the data-driven business of the organizations.

 

·Tracing Internet data to get Valuable-Insights

Another common scenario of our day to day life is internet data. The internet data can be used by e-commerce sites to provide valuable suggestions and recommendations for their customers by tracking the products searched by them. Similarly by analyzing the internet data Social Networking sites can provide friend suggestions, group suggestions, etc., some sites keep track of the internet address from which the user has logged in previously to notify about suspicious login, while some other internet data is used by video streaming sites to give suggestions and for resuming the video based on watch history.

 

·Data Lineage for an effective Data Privacy

Enterprises today deal with a wide variety of sensitive information and need to take concrete measures to protect data privacy of their customers, partners, and stakeholders. Strong data lineage is required in the IT enterprise to ensure their end-users security, meet compliance regulations like GDPR and legitimize the privacy of their data.

We as a part of iEDPS Product Team have been building comprehensive Data Lineage capabilities for our enterprise customers to govern their Data Privacy needs.


Author: - Gowripadma Murugesan, Sujay Saha


References

https://www.slideshare.net/LeighHill5/the-art-of-implementing-data-lineage

https://erwin.com/blog/what-is-data-lineage/

https://www.dataversity.net/data-lineage-demystified/

https://www.xenonstack.com/insights/data-lineage/

https://getmanta.com/ultimate-guide-to-data-lineage/

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.