July 13, 2016

Industrial Internet of Things (IIoT) - Conceptual Architecture

Posted by Ketan Puri (View Profile | View All Posts) at 6:29 AM


The popularity of Internet of Things (IoT) is growing rapidly. More and more devices (things) are getting connected to the internet every day. The value potential through these connected devices is enormous. We have witnessed just a fraction of its potential yet. Many startups are in process of building data driven value products, solutions or services that can disrupt the traditional operational procedures. Major cloud vendors have also ventured into it, providing IoT as a key offering in their product stack.

Industrial IoT extends the general concept of IoT to an industrial scale. Every industry has their own set of devices, home grown or proprietary applications with limited interfaces and for some even network bandwidth is of a major concern. Considering the challenges and limitations, varying from industry to industry, there is no single solution that fits all. Every industry is unique in itself with varied set of use cases and require custom tailoring.

This article will talk about the conceptual architecture for an Industrial Internet of Things (IIoT), agnostic of technology or solution.

Below are the key components of any typical IIoT landscape


a) Industrial Control Systems (ICS)

These provide first hand view of events across industrial systems to the field staff to manage the industrial operations. They are generally deployed at industrial sites and includes Distributed Control Systems (DCS), Programmable Logic Controllers (PLCs), Supervisory Control and Data Acquisition (SCADA) systems and other industry specific control systems.

b) Devices

These are industry specific components that interfaces with digital or analog systems and expose data to the outside digital world. They provide machine to machine, human to machine and vice versa capability for ICS to exchange information (real-time or near real-time) enabling other components of the IIoT landscape. It includes sensors, interpreters, translators, event generators, loggers etc.

They interface with the ICS, Transient Data Stores, Channels, and Processors

c) Transient Store

This is a temporary optional data store that is connected to a device or an ICS. Its primary purpose is to ensure data reliability during outages and system failures including networks.  It includes attached storage, flash, discs etc.

They generally come as an attached or shared storage to  the devices .

d) Local Processors

These are low latency data processing systems located near or at the industrial sites. They provide fast processing of the small data. It includes data filters, rule based engines, event managers, data processors, algorithms, routers, signal detectors etc.

They generally feeds data into the remote applications deployed at the industrial sites. At times these are integrated with the devices itself for data processing. 

e) Applications (Local, Remote, Visualization)

These are deployed on site or offshore to meet business specific needs. They provide insights/views of the field operations in real time (for the operators), real time and historical (for business users and other IT) staff enabling them to make effective and calculated decisions.  It includes web based applications, tools to manipulate the data, manage devices, interact with other systems, alerts, notifications, visualizations, dashboards etc.

f) Channels

These are the mediums for data exchange between devices and outside world. It includes satellite communication, routers, network protocols (Web based or TCP)   etc.

g) Gateways

These provide communications across multiple networks and protocols enabling data interchange between distributed IIoT components. It includes protocol translators, intelligent signal routers etc.

h) Collectors

These are data gatherers that collect and aggregate data from gateways leveraging standard protocols. It can be custom built or off-the-self products that vary from industry to industry. For example, OPC data, event stream management systems, application adapters, brokers etc.

i) Processors

These are the core of any IIoT solution. Their function is primarily to cater to specific business needs. It includes stream processors, complex event processing, signal detection, scoring analytical models, data transformers, advance analytical tools, executers for machine training algorithms, ingestion pipelines etc.

j) Permanent Data Store and Application Data Store

These are the long term data storage systems generally linked to an IIoT solution. They act as a historians for the device data along with data from other sources. They feed data into the processors for advanced analytics and model building. It includes massively parallel processing (MPPs) data stores, on-cloud/on-prem data repositories, data lakes providing high performance and seamless data access to both business and IT. For example historians, RDBMS, open source data stores etc.

k) Models

There are two type of models that are widely used in the IIoT solutions i.e. Data Models and Analytical Models. The data models defines a structure to the data while the analytical models are custom built for catering to industry specific use cases. Models play an important role in any IIoT solution. They provide a perspective to the data. Models are generally built by leveraging the data in the permanent data stores, human experience, and industry standards. The analytical models are trained leveraging historical data sets or through machine based training process. Some examples of the analytical models are clustering, regression, mathematical, statistical etc. Some examples of data models are Information models, semantic models, Entity relationships mapping, JSON, XML/XSD etc.

The models are fed back into the data stores, processors, applications, and gateways

l) Security

Security is the most important aspect of any IIoT application. It runs through entire pipeline from source to the end consumption. It is very critical for small, medium and large data driven digital enterprises dealing with their data in IIoT world. It includes data encryption, user access, authentication, authorization, user management, network, firewalls, redaction, and masking etc. 

m) Computing Environments

These vary from industry to industry depending upon their business landscape and nature of the business (Retail, Health Care, Manufacturing, Oil and Gas, Utilities etc.)

  • Fog Computing - Bringing analytics near to the devices/source

  • Cloud Computing - Scaling analytics globally across the enterprise

  • On-Prem Computing - Crunching data in existing high performance computing centers

  • Hybrid Computing - Mix of on-cloud, on-prem and fog computing optimizing operations tailored for specific industrial business needs   


Continue reading "Industrial Internet of Things (IIoT) - Conceptual Architecture" »

April 11, 2016

How to make a 'Data Lake'

Posted by Ketan Puri (View Profile | View All Posts) at 2:30 PM

Data Lake has become a buzz word these days and we see enterprises actively investing to have their own Data Lake.

As part of Digital Agenda for most of the enterprises, Data Lake is one of the most prominent focus areas. Investments are happening in terms of Data Acquisition, Storage (Cloud or On-Prem) and Analytics. The success rate for most of the enterprises is dismal. The reason is not the capability or technology, instead the right direction and focus on the Value.

The hype to have all the data at one place and think of its usage later has created more Data Swamps than a valuable Data Lake.

My article in the Digital Energy Journal (Issue 60- AprMay2016) is a first step to give some structure to the concept of the Data Lake.

Below is the image taken from the article with the permission from the Editor.


March 10, 2016

New Generation of Data Analytics and New Generation of Opportunities

Posted by Shahnawaz Qureshi (View Profile | View All Posts) at 11:47 AM


As someone once said "Intelligence is nothing but amount of meaningful information harvested from the data available".  So it's the data and the skill/talent/pattern to extract the meaningful information that defines the intelligence, knowledge and statistical predictions.

Traditionally analytics were driven by deriving trends and patterns from the historically accumulated data, these data normally included conventional data that were logged or recorded as they occurred and were done for this specific purpose. With the advancement of technology and its penetration into different turf things have changed, today application of technology in different space is in a way digitizing the offerings where the actual primary outcome of the technology inadvertently serves as a data for breakthrough analytics that could be derived by connecting different dots. The dots that are meaningless independently but when connected gives you the "Big Picture". Here in this blog I will cite some example where information is extracted from an unconventional and innovative way that otherwise were not obvious.

There is a saying that "A picture is worth a thousand words" and for Data Analytics what this means is that every picture has a treasure of information waiting to be harvested. Traditionally it was difficult to deal with images and extract meaningful information out of it as you would have done with character data's but the advancement of technology is breaking this barrier. Google Goggles was among one of the earlier efforts that ventured into this field of unlocking this data that hid within images and today their latest mobile app Google Photos leverages this capability of searching image for their contents, through Image processing and geotagging information.  This pattern of extracting information from image has today paved way to cutting edge analytics where Satellite images of parking lots of big retailers like Walmart, Home Depot and others are used to predict their quarterly earnings thus giving research analyst actual raw data that could be trusted. Similar pattern on a bigger scale is helping predict global economy where satellite images of Oil Storage, Movement of Trucks in Mines, Agriculture fields and Night lights are reflecting the current status of world economy as they shape, this all has been possible by having real times images transformed into precious data.

Showing real time traffic has been one of the common features of GPS devices and apps used today, initially this information were sourced through traffic sensors placed by government authorities and other concerned departments as a result only few roads had this privileges for their traffic data to be analyzed, all this changed with company like Google going the crowdsourcing way, with crowdsourcing google was able to improve the reach and accuracy of its traffic information and predictions, when users use google maps for navigation on their phone, the phone sends back the data anonymously which help the company determine how fast the cars are moving in any specific roads, with acquisition of Waze Google was able to add human touch to this algorithm as drivers could provide real-time feedback of their driving experience on specific routes. So does this stop here? Nope, with more and more users using google maps for navigation, Google has access to vast amount data pertaining to end users driving habits - How much the user drives, where does he drive, what time he drives, the speed, the critical points the user navigates and much more, with this immense amount of data company like Google is well positioned to provide Auto insurance based on unique data analytics that could rival traditional actuarial methodology.

To end this post I would like to cite a personal example and opportunity that I have been lucky to come across. Being tech freak, last year, I bought a car adapter called Automatic it comes with a companion mobile app and helps your car to be connected, it tracks your driving behavior, miles driven fuel consumed and routes taken, basically a handy device if you are interested in analyzing your driving habits at leisure. As the new year 2016 arrived, I got a report from Automatic that had some key aspects based on my driving habits for the gone year like, average fuel consumed, States travelled e.t.c, but one of the fact reported was interesting, the report mentioned that I left my home to work 16 minutes earlier than an average Automatic user did and I arrived home 44 minutes later than the average Automatic user did (huh.. me working hard J), Automatic had enough data and pattern to determine my office commute, now when I think about this, there is a huge potential of connecting the other dots and harvesting some incredible facts and opportunities. For example based on my parking location my place or organization where I work could be determined or I could be given an incentive to let reveal where I work and in turn would get insights to how my fellow workers (of course anonymous details) fared with their working hours. This collective data can give out quite some interesting facts about the work place culture like, average hours an employee works,  average data of employee's commuting hours and timings in a day can determine how flexible the organization working hours were, average regular data of employee commuting to work place can determines organizations policies and support for Work From Home, thus the data which are logged for certain aspect can be traversed to different context and augmented with other details to gain wealth of information, which in turn can turned into tremendous opportunities. Imagine company like Glassdoor taping into such data by forming an alliance, it could be a goldmine of actual raw data to augment their current analytics.

This is just some of the examples how today data no longer come from traditional intended sources but are harvested from wide ranges of technology offerings and then augmented and enriched with other factors to give some incredible depths and insights. Today with analytics, anymore it's not the sky but the Imagination that's your limit, hence it's time to get Datatized!!

Continue reading "New Generation of Data Analytics and New Generation of Opportunities" »

November 13, 2015

Analytics Funnel

Posted by Ketan Puri (View Profile | View All Posts) at 6:10 AM

  Mining "VALUE" from data is an art of science commonly referred as Data Science. The value lies in the data hidden deep within the fabric of the enterprise. Extracting the value requires skills, tools and techniques. If these are combined with the right methodology governed by principles and standards the process becomes simpler. In one of my published articles, I have tried to depict this methodology in the form of an Analytics Funnel.


October 31, 2014

The brave new world of Big Data Analytics

Posted by Rahul Jain (View Profile | View All Posts) at 9:17 AM

We all have heard about Big Data and much more about the hype surrounding it. Is it worth the excitement? Can Big Data live-up to its expectations or is it just another technology fad?

Every day 2.5 exabytes (2.5 billion GBs) of data is generated, and 90% of current data has been generated as recently as in last two years. [1] Data is not just growing, but it's exploding exponentially. We have witnessed data explosion trend over the past few years. The trend is certain to continue as the world embraces digitalization further. Digitalization has allowed organizations to capture/store more data and details, which was previously either infeasible or expensive.

Big Data is not just about storing all social media activities, clickstreams, public web, transaction and application data, machine logs, etc., but it has more to do with insights generation using collated data.

Big Data Analytics at work

We will discuss three examples which will help to understand the potential of this field.

During FIFA 2014 World Cup, Microsoft's Bing prediction engine predicted winner of each match even before the match started! The prediction engine was so successful that it was able to point out winner for each elimination match right till the finale, that's an impeccable record of 15 matches in a row! The prediction algorithm used various types of data fields like winning/losing margins, offensive and defensive stats, team composition, player position, game timing, playing condition like weather, and venue distance from playing country. If these exclusive insights would have been made available to one particular team, their FIFA journey would have been far more focused and successful. In near future, don't be surprised to see brands partnering up with Bing (or any other for that matter) to predict the winning team in order to make their sponsorship decisions.

Let's take a look at another example, this one is from the media and entertainment industry. Netflix started off as a content distribution company, soon they realized the value of their existing dataset in terms of customer taste. Netflix went on investing $100m into production of a TV show 'House of Cards' without even considering the pilot episode (Pilot episode is used to evaluate the performance of a new TV show in order to get go ahead with investment for full production). Netflix knew the show will be a hit even before the production started! In line with Netflix expectations, the show was well received by subscribers. About 10% subscribers began streaming the show on Day 1, and many of them ended up binge-watching. Thanks to Big Data analytics capabilities of Netflix that analysed billion hours of viewing patterns along with reviews and feedback of its 50 million subscriber base. The same technique is now used to invest in other TV shows/documentary productions. These shows being exclusive to Netflix, are fuelling up Netflix subscriptions, and also, they are making their presence felt at various award functions such as Emmy/Golden Globe Award/Oscar.

Let's take insurance sector now. If it's one of those days you think you need your car insurance more than ever may be due to bad weather or just trusting your intuitions, big data analytics can come to your rescue. Understanding the need of a customer segment, the vehicle insurance companies have come up exciting innovation of urge based insurance policies like 'Pay as you drive' and 'Pay how you drive'. The insurer will quote a premium considering various factors like driving location, traffic and weather condition, time of drive and data from on-board diagnostics. Such products are sometimes more economical than traditional products. These products are sure to gain further traction, once the cars become digitally connected.

The way forward

In above examples, we have seen organizations making bold move in order to leverage their big data capabilities. These are early days for big data, especially for big data analytics (BDA), and we have seen promise of what technology can help us achieve.  BDA in near future is sure to have a significant impact on business model of many companies. Also, we are certain to see innovative products and new revenue streams solely powered by analytics.

For any business or organization to invest in big data technology, they need to understand that BDA is not a crystal ball, at times it has its own limitation. Further, for any successful implementation of BDA project requires right blend of technology, machine learning expertise along with strong business acumen.

[1] http://hbr.org/2012/10/big-data-the-management-revolution/ar

October 27, 2014

OPC UA and High Speed Data streaming- Enabling Big Data Analytics in Oil and Gas Industry

Posted by Ketan Puri (View Profile | View All Posts) at 4:42 AM

Role of OPC UA in Oil and Gas Business

OPC primarily a Microsoft based solution for interfacing manufacturing and production systems. It was an abbreviation for "Object Linking and Embedding" (OLE) for Process Control (OPC). With time this concept has evolved into Open Platform Communication (OPC). Today it's more commonly referred as Open Productivity and Connectivity (OPC). The classic Microsoft version of OPC relies on COM and DCOM objects for interfacing the machine data.  The COM and DCOM has their own limitations mostly related to platform dependence, security, and data portability across firewalls and other non-Microsoft based applications.

OPC Foundation has come up with an architecture framework and standards to overcome the limitations of classic flavor of OPC. This new standard or framework is referred as OPC Unified Architecture (OPC UA). It provides standards for data extraction, security, and portability independent of any proprietary technology or platform. APIs are available in Java, .NET, and Ansi C. European energy companies are one of the early adopters of these standards and funding vendors to develop products around these standards. Vendors are already in market offering their product stacks around OPC data streaming, web services, visualization tools, and Real time data analytics. It is not be going to replace the existing OPC Classic implementations in a short term but can act as a mechanism of exposing this data to the enterprise in a much more effective fashion.

OPC Foundation certifies the products developed by these vendors and exposes APIs to manipulate the data. It provides a more standard mechanism of representing data types ranging from simple to more complex data structures. Models can be created to cater to different needs of the business while data is in motion (Real Time Data Analytical Models) or data at rest (Staged Data Analytical Models). Business rules can easily be configured to process the information in more time and cost effective manner. The various streaming data tools enable simultaneous streaming of information to multiple applications and data stores by use of high speed data transfer protocols. OPC UA provide a robust security model for data transfers. It also enables custom application development by leveraging the APIs. It will help the enterprises to get most of the value out the data.

Most of the Oil and Gas Exploration & Production companies relies on proprietary products that cover the deficiencies of OPC classic by creating their own wrapper to expose this data to the enterprises. Hence creating a strong dependency on these product vendors and their line of products to cater to the different business needs. It leads to a high licensing, and Infrastructure costs for upstream businesses. Due to proprietary nature of the data; data extraction, transformation and Integration add up to the cost for these enterprises. Not only there is a cost impact but also an impact to the business operations. By the time these enterprises gets a chance to even look into the data, they have already lost most of the value that this data has to offer. The operational real time risks that could have been avoided and can be converted into opportunities, get materialized.

From the performance perspective OPC DA is good for simple data types while OPC UA is designed for Complex data types which is more relevant for the upstream enterprises. The address space concept in OPC UA makes if more lucrative to the enterprise data management systems. OPC UA currently supports secure and firewall friendly high speed Binary TCP data transport and the web based protocols. With openness of these standards we can use custom protocols if we need to achieve higher data transfer speeds. There are various other protocols in market like FASP©, proprietary protocol developed by Aspera®, now an IBM® Company. FASP Byte Streaming APIs can eliminate the limitations of TCP in relation to data packet tracking and packet loss. It is independent of geographical distances and can transmit data at blazing speeds.

The Upstream, Midstream, and Downstream operations relies heavily on the PLCs, DCSs, PACs, data recorders, and control systems. These OPC enabled devices produces data every second. The data is managed using popular proprietary OPC Classic servers. Using OPC UA the data can be exposed and ported to the enterprises in a much cost effective and timely fashion.

The OPC UA has opened the doors for the enterprises to have a real time view of their operations and devise new business process models leveraging the benefits of Big Data Analytics, Real time predictive Analytics, and much more.

Benefits to Oil and Gas Operations:

Enterprises can benefit from the new open standards in many ways saving on cost and efficient operations ranging for Well Life Cycle Management, Distribution, Refining and Trading.

Areas of focus:

Action and Response Management

OPC UA provides context and content in real time to the enterprise. Alerts and Events generated by various SCADA systems can be made accessible to the system operator and enterprise at the same time. System operators need not rely just on their instincts and experience but has support from entire enterprise. Notifications can be sent to multiple stakeholders and efficient response strategies can be implemented for each event. It also enables the analyst to visually interpret the data on the web and mobile devices in order to respond to the incidents in real time. Data movement need not rely on proprietary systems to move it across multiple network layers. Custom visualization models give a different perspective of the data flowing out to the enterprise.

Decision Management

Making right decisions at right time can save millions for the enterprises. Decisions are based on the insights generated by the data across the enterprise. Most of this is the OPC data generated by the devices operating at remote locations. Faster we analyze data the better is the value we get from the insights. For example, exploration sensor data can guide upstream to decide on whether to proceed with drilling operation at a particular site, understand the geology of the site and procure right equipment's to execute drilling operations, deciding on the well trajectories to maximize the production, optimize the drilling parameters for safe and efficient drilling, optimize the refining operations based on hydrocarbon analysis of the oil, determining shortest route for transporting oil and gas, help in scheduling oil and gas operations, better decisions for executing large oil and gas trades.

Risk Management

Oil and gas industry is highly prone to risks, whether it's related to deep water drilling operations or transportation of oil and gas across the challenging terrains. A small incident can lead to loss of billions of dollars and on other hand it can open doors to tremendous opportunities. It's about understanding the risks, its consequences and leveraging right strategy to handle the risk. Most of the assets and equipment are OPC enabled and generate tons of data every second. The data if tapped at right time the organizations not only deal with any risk with confidence but also can exploit the opportunities. Analytical models can easily crunch the data at is most granular form leveraging OPC UA and provide ammunition to the enterprise to optimize their operations.

Health and Safety

The OPC UA data can be streamed directly from the drilling assets to the enterprise with an ability to perform in-flight analytics. The data can feed into analytical models designed to predict outcomes and prescribe actions to ensure safety of the operational landscape. Data is the new currency for the modern world and can provide insights to improve health and safety aspect of the oil and gas enterprise and meet the legal and regulatory needs of the region.


Future of Oil and Gas Enterprises

With the latest technological advancements, right investments, and capacity to accept change, it's not far when our oil and gas enterprises will step into the new era of intelligent field operations. As quoted by OPC Foundation that "There will be a need of only two living beings on the oil field, a Dog and a Human, where Human ensures dog gets his food on time and Dog ensures Human do not interferes with the field operations."

September 4, 2014

What do we mean by 'Realizing business value from big data'?

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 10:21 AM

Posting on behalf of Rajeev Nayar, Associate VP & Head, Big Data Practice

According to Gartner, about 64% of the companies have already invested in Big Data, and 56% have been struggling to derive value. This is something that we often hear when we talk about data, big data, analytics, reporting intelligence and so on and so forth. The question now is - 'Are enterprises really able to leverage the strength of data to start with and with deluge of data are enterprises able to leverage the strength of Big Data?'

When it comes to 'Realizing business value from Big Data' there seems to be a mismatch in how IT and business teams operate. Business teams wish to rapidly develop new insights and improve areas such as personalization, customer satisfaction; all irrespective of where the data lies. Technology teams traditionally are responsible in constructing a reporting mechanism to deliver these insights. This invariably has its own time lag from when the requirements were understood and to the time business team got the results. More often than not those results that they got were much later from when they actually needed it and it became irrelevant by the time the business teams actually got it. There is a significant mismatch in terms of how do we really look at this scenario of a structured IT construct and of the agility that the business needs and how we can help the enterprises leverage the strength of data in making right and timely decisions for their business expenses. Today, if we look at most enterprises, only about 20-25% of enterprise data is structured which has traditionally been used for analysis. 80% of data is unstructured. Only about 20% of data is traditionally used for analysis which is the structured construct.

What does an enterprise need to do when you look at this scenario?

First of all, the first big change is the scope of data itself. The scope with Big Data has changed from the traditional 20% structured construct that we have always been used to, to this large other 80% of unstructured data that has just become a part of the scope.

From the focus on data relationships, because structured data has always been based on the construct of established data relationships, the focus now shifts to understanding co-relations. The shift is happening from relying on data relationship which is known, to identifying correlations which are unknown in the past and that is really where the power of insights comes from.

And the third thing lies in identifying correlations moving away from data relationships, it's important to embed the element of discovery in the pattern of how you identify correlations. If you're not able to discover, you will minimize the element of identification of correlations.

Therefore the enterprises need to rethink their approach to Big Data first of all in changing scope of what data is meaningful to customers enterprise, secondly moving away from traditional data relationships to new correlations in business and to finally maximize correlations, enable technologies, enable frameworks that allows for self-discovery. And realizing business value lies in integrating these three pillars together.

Infosys Big Data services and solutions, have focused on helping enterprises to build a data ecosystem that empowers both technology and businesses to rapidly develop and action insights relevant to enterprises and customers business.

August 25, 2014

"Big Data is a fairly overhyped term. At the end of the day, analytics are about all forms of data." - Rajeev Nayar

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 10:00 AM

Rajeev Nayar, Associate VP & Head, Big Data Practice - Cloud, Infosys was featured in an exclusive interview with Analytics India Magazine(AIM). Rajeev spoke about the Big Data practice at Infosys, how we are seeing the market changing and how Infosys plans to sustain itself in the Big Data market.

 Some of the points Rajeev makes are:

·         Big Data is a fairly overhyped term. At the end of the day, analytics are about all forms of data. Big data just deals with data sets of extremely high volumes that cannot be handled by traditional tools

·         We strongly believe that there will ultimately be an amalgamation of Big Data with traditional BI/BA practices. What we are looking to achieve is to provide a common layer which brings together all data formats and help generate insights

·         Almost all organizations today are focused on using big data technologies rather than addressing the problems this technology is trying to solve. At the end of the day, it is important to understand that these are just tools

·         The largest concern for us lies in the acute shortage of skills in this space. Enterprise solutions are still not mature enough to be intuitive. Obtaining and retaining resources for big data projects is a big challenge

·         We see a lot of focus on how enterprises are going to analyze these vast dispersed sources of information without physically bringing them together

Click here to read the entire article, courtesy of AIM

June 9, 2014

Increasing Relevance of Self Service BI

Posted by Yogesh Bhatt (View Profile | View All Posts) at 5:31 AM

Rigid BI processes to get the data from source to end reporting business consumers via complex IT processes, thus creating dependency, latency and eventually business users losing the opportunity to discover, take decisions when it matters most. The CIO's are constantly struggling to have the right balance between the Business needs and IT Challenges. Good reason for increasing relevance of Self Service BI demand.

Continue reading "Increasing Relevance of Self Service BI" »

March 4, 2014

The Rising Bubble Theory of Big Data Analytics

Posted by Ketan Puri (View Profile | View All Posts) at 5:59 PM

The big data analytics has gained much importance in recent times. The concept of analyzing large data sets is not new. Astronomers in olden days used large observational data to predict the planetary movements. Even our forefathers used years of their experience for devising better ways of doing things. If we look through our history, evolution of modern medicine, advancement in space research, industrial revolution, and financial markets; data has played a key role. The only difference as compared to recent times is the speed by which the data got processed, stored, and analyzed.

With the availability of high computing and cheaper data storage resources, the time of processing the information has gone down drastically. What took years of experience and multitudes of human effort, now machines can do it in split of a second. Super computers are breaking the barriers of computing power day after day.  Classic example is of weather forecasting. Statistical modelling of data and using the computational power of modern machines, today we can predict weather with an hourly accuracy.

The concept of big data analytics has also spread in financial markets to predict the stock prices based on thousands of parameters. Financial models that can predict the economies of countries. We can find examples of big data analytics in any field of modern civilization. Whether its medicine, astronomy, finance, retail, robotics or any other science known to man, data has played a major role. It's not only the time aspect but the granularity of data that determines the richness of information it brings.

The Rising Bubble Theory of Big Data Analytics is a step towards understanding the data based on its movement through various layers of an enterprise. It is based on the analogy to the bubble generated at the base of an ocean and the journey it makes to reach the surface coalescing with other bubbles, disintegrating into multiple bubbles, or getting blocked by various obstructions in the turbulent waters. The data can take multiple paths based on varied applications in an enterprise. The granularity of data changes as it moves through the various layers of applications. The objective is to tap the data in its most granular form for minimizing the time for its analysis. The data undergoes losses due to filtering, standardization and Transformation process as it percolates through the different application layers. The time aspect refers to the transport mechanism or channels used to port data from its source to its destination. When we combine the analysis of data granularity and time aspects of it movement we can understand the value that it brings.

Data Value (dv)                  Granularity (g) /Time (t)

Data granularity can be associated to data depth linked to its data sources. Granularity of the data increases as we move closer to the data sources. At times due to complex nature of the proprietary data producers, it becomes difficult to analyze the data. The data need to be transformed into a more standard format before it can be interpreted into a meaningful information. Tapping this data as early in its journey can add great value for the business.

Data can move both horizontally or vertically. The horizontal movement involves data replication while vertical movement involves aggregation and further data synthesis.