The big data analytics has gained much importance in recent times. The concept of analyzing large data sets is not new. Astronomers in olden days used large observational data to predict the planetary movements. Even our forefathers used years of their experience for devising better ways of doing things. If we look through our history, evolution of modern medicine, advancement in space research, industrial revolution, and financial markets; data has played a key role. The only difference as compared to recent times is the speed by which the data got processed, stored, and analyzed.
With the availability of high computing and cheaper data storage resources, the time of processing the information has gone down drastically. What took years of experience and multitudes of human effort, now machines can do it in split of a second. Super computers are breaking the barriers of computing power day after day. Classic example is of weather forecasting. Statistical modelling of data and using the computational power of modern machines, today we can predict weather with an hourly accuracy.
The concept of big data analytics has also spread in financial markets to predict the stock prices based on thousands of parameters. Financial models that can predict the economies of countries. We can find examples of big data analytics in any field of modern civilization. Whether its medicine, astronomy, finance, retail, robotics or any other science known to man, data has played a major role. It's not only the time aspect but the granularity of data that determines the richness of information it brings.
The Rising Bubble Theory of Big Data Analytics is a step towards understanding the data based on its movement through various layers of an enterprise. It is based on the analogy to the bubble generated at the base of an ocean and the journey it makes to reach the surface coalescing with other bubbles, disintegrating into multiple bubbles, or getting blocked by various obstructions in the turbulent waters. The data can take multiple paths based on varied applications in an enterprise. The granularity of data changes as it moves through the various layers of applications. The objective is to tap the data in its most granular form for minimizing the time for its analysis. The data undergoes losses due to filtering, standardization and Transformation process as it percolates through the different application layers. The time aspect refers to the transport mechanism or channels used to port data from its source to its destination. When we combine the analysis of data granularity and time aspects of it movement we can understand the value that it brings.
Data Value (dv) Granularity (g) /Time (t)
Data granularity can be associated to data depth linked to its data sources. Granularity of the data increases as we move closer to the data sources. At times due to complex nature of the proprietary data producers, it becomes difficult to analyze the data. The data need to be transformed into a more standard format before it can be interpreted into a meaningful information. Tapping this data as early in its journey can add great value for the business.
Data can move both horizontally or vertically. The horizontal movement involves data replication while vertical movement involves aggregation and further data synthesis.
With the recent technological advancements, cheaper data storage options, higher processing power of modern machines, and availability of wide range of toolsets, the data analytics has gained much focus in Energy domain. Enterprises have started looking into newer ways to extract maximum value out of the massive amount of data they generate in their back yards. Unlike to the other domains (Retail, Finance, Healthcare), Energy companies are still struggling to unleash the full potential of Data Analytics. The reasons could be many but most common are:
· High Capital Cost with low margins, limiting their investments
· Dependency on legacy proprietary systems with limited or restricted access to the raw data in readable format
· Limited network bandwidth at the exploration and production sites for data crunching and effective transmission
With advent of new standards like OPC UA, WITSML, PRODML, RESQML, evolution of network protocols, and powerful visualization tools, the barriers to Exploration and Production data analytics are breaking down. Oil and Gas companies have already started looking to reap the benefits of their massive data lying dormant in their data stores. Massive amounts of data is getting created every second. OPC data related to assets, remote devices, sensors; well core & seismic data, drill logs, production data etc. are some of common data categories in Exploration & Production (E&P) domain. The new data standards, readable formats (XML) has enabled these enterprises to interpret and transform this data into a more meaningful information in most cost effective manner. They only need to tap into this vast repository of data (real time or staged) by plugging in some of the leading data analytic tools available in the market. The data analytics tools has enabled these enterprises to define and implement new data models to cater to the needs of the business by customizing information for different stakeholders (Geoscientists, Geologists, System Operators, Trading departments, etc.).
Broadly, the Exploration and Production (E&P) data analytics can be classified into two categories:
1. Real Time Data analytics
2. Staged Data Analytics
Need of Real Time Data Analytics
Real time analytical solutions cater to the mission critical business needs like predicting the behavior of a device under specific set of conditions (Real Time Predictive Analytics) and determining the best suitable action strategy. It can help in detecting the thresholds levels of temperature and pressure for generators, compressors, or other devices and mitigating the impacts of fault conditions. Alerting based custom solutions can be built on top of the real time data analytical models. Most critical monitoring is done using proprietary tools like SCADA systems or other proprietary tools at onsite locations. It can become very challenging to provide large computing capacity, and skilled human resources at these remote and hazardous locations. Network bandwidth is also a limiting factor for transporting massive amounts of data to the enterprise data centers. Most of the information is limited to onsite system operators with limited toolsets. Enterprises gets a much delayed view of this data, creating too much dependency on system operators to manage the systems. Current approach to tackle the problems has become more reactive than proactive.
Real Time Data Analytics
Exploration and Production data streams can be tapped and mapped to the real time analytical models for in flight data analytics. These models can help the operators to formulate response strategies to mitigate the impact of fault conditions in a more effective way. System operators can focus more on their job rather than worrying about the logistics. They can have wider access to the enterprise knowledge base.
The data is streamed in real time to the enterprise data centers where live monitoring can be performed using more advanced computing techniques. Multiple data streams can be plugged together and analyzed in parallel. Data modelling techniques enable the enterprises to design cost effective data integration solutions. The advantages of real time analytics are huge. Implementations of fussy logic and neural networks, real time predictive analytics, and applications of advanced statistical methods are few to mention. It has opened the doors to limitless benefits for E&P organizations.
Staged Data Analytics
The data streamed from remote locations can be stored into high performance databases for advance staged data analytics. Complex statistical models and data analysis tools can work their magic on the staged data. Staged data analytics is done on historical data sets to identify data patterns and design more effective business solutions. It also helps enterprises to improve the performance of the systems, identify gaps, optimize existing systems, and identify the need for new processes and lot more. Models can be created to simultaneously analyze massive amount of data from other data sources (related or unrelated) by use of leading industry analytical tools. Generally the E&P companies use these tools for reporting purposes to cater to the varied need of stakeholders across the enterprise. The full potential of staged data analytics is still to be explored in the Energy domain. It can bring benefits ranging from business process optimization, identifying process bottlenecks, more effective and safer operating conditions, forecasting outcomes using simulation techniques and so on. It can create a totally new perspective of a business scenario.
Rajeev Nayar, Associate VP & Head, Big Data Practice - Cloud, Infosys was featured in an exclusive column in the latest issue of Express Computer titled 'Big Data Hadoop'. In the column, Rajeev had listed out how enterprises should evaluate Hadoop with regard to big data projects. Some of the points are:
Click here to read the entire issue, courtesy Express Computer