Relevance of Data Governance in Big Data
Back in 2009, Tony Fisher, author of book "The Data Asset : How Smart Companies Govern their data for Business Success" mentioned about various data initiatives at most Enterprises will either fail or underperform if the underlying data cannot be trusted. Key reason being Inconsistent, Inaccurate and Un-reliable data that seeped into the data lifecycle process. Reasons can be multi-fold:
- Data Identification for initiatives wasn't complete - what data to acquire, how it's going to be consumed, is it meeting business objectives, who owns it..
- Data Acquisition & Transformation didn't had right standards, structure, metadata definitions or data ownerships, policies, rules for transformations
- Data Delivery wasn't well defined in terms of security, business user context, association of data with business process missing and so on so forth.
The most popular phrase in the Data world "Better data means better decisions" was always true, and is ever so true in today's big data world however the underlying assumptions still haven't changed for this phrase to hold true and those being - "The underlying data is accurate, trustworthy, reliable, in-Context and consistent" - Can we imagine this possible without a Solid Data Governance program in place?
We have heard the terms like "IT Business Alignment is critical to our Enterprise", "IT being a business enabler function" & similar other phrases. Taking a realistic dip-stick across Enterprise, only a handleful have been able to achieve the purpose those phrases were defined for. For most of the Enterprise's there still exists a gaps in IT and Business objectives, & CIO/CXo's struggle with aligning IT and Business objectives to the strategic goals of the Enterprises. Taking cue from the successful Enterprises, there's one very clear signal that stands out "An Effective Data Governance Program".
Ok, good time to understand what a Data Governance program means and comprises of.
Data Governance - Means to have a business driven data policy, ownership, stewardship, monitoring, standards and guidelines for entire Enterprise data life cycle right from acquistion to data consumption & archiving. The focus of Data Governance clearly being towards treating data as Asset for the Enterprise.The 6-Pillars of Data Governance as shown in Diagram 1 below along with Change management form the bare-bone for any data initiative within the Enterprise.
Diagram 1 - Data Goverance
Fine, now let's spend sometime understanding the Big Data definition and how a Big data initiative should be launched
Simple Definition - Big data is High volume data, which comes in varietiy of formats, quite frequently via various channels & that cannot be only processed using traditional data processing methods and techniques, and carries lot of business value hidden within.
Classification of Big Data sets:
- Unstructured - Text, Videos, Audio, Images
- Semi-Structured - Emails, Software packages/modules, Spreadsheets, Financial reports
- Structured - DWH/BI Data, Sensor/machine data logs, RDBMS Data
Now that we understand the two - Big data and Data Governance, the diagram below depicts Data Goverance comes into play in bridging the gaps between the IT and Business alignment.
Diagram 2 - Data Governance Bridges the Gap between Business & IT Objectives
Any Big data initiative should consider the Volume (High), Variety (Multiple), Ambiguity (High), Frequency (High) and Quality (Unreliable) to be thoroughly identified, defined and analyzed by Data Architecture teams (Data Scientists, Data Analysts), in consultations with Business Stakeholders (Data and Business Process Owners). The outcomes from this exercise presented to the Data Governance council consisting of Enterprise Strategy owners, Business Process/Data Owners and IT Infrastructure specialists during early stage of such initiatives to nail out any gaps, discrepency and ensure that mapping of Big Data sets with Business process is defined. In this blog i still haven't peeled the pillars of Data Governance and how each discipline touches the Big Data challenges, i intend to do this in my next blog on Data Governance relevance in Big Data World.
Would be good to hear your experiences of embarking on Big Data journey with or without Data Governance as priority item on list.