October 31, 2014

The brave new world of Big Data Analytics

Posted by Rahul Jain (View Profile | View All Posts) at 9:17 AM

We all have heard about Big Data and much more about the hype surrounding it. Is it worth the excitement? Can Big Data live-up to its expectations or is it just another technology fad?

Every day 2.5 exabytes (2.5 billion GBs) of data is generated, and 90% of current data has been generated as recently as in last two years. [1] Data is not just growing, but it's exploding exponentially. We have witnessed data explosion trend over the past few years. The trend is certain to continue as the world embraces digitalization further. Digitalization has allowed organizations to capture/store more data and details, which was previously either infeasible or expensive.

Big Data is not just about storing all social media activities, clickstreams, public web, transaction and application data, machine logs, etc., but it has more to do with insights generation using collated data.

Big Data Analytics at work

We will discuss three examples which will help to understand the potential of this field.

During FIFA 2014 World Cup, Microsoft's Bing prediction engine predicted winner of each match even before the match started! The prediction engine was so successful that it was able to point out winner for each elimination match right till the finale, that's an impeccable record of 15 matches in a row! The prediction algorithm used various types of data fields like winning/losing margins, offensive and defensive stats, team composition, player position, game timing, playing condition like weather, and venue distance from playing country. If these exclusive insights would have been made available to one particular team, their FIFA journey would have been far more focused and successful. In near future, don't be surprised to see brands partnering up with Bing (or any other for that matter) to predict the winning team in order to make their sponsorship decisions.

Let's take a look at another example, this one is from the media and entertainment industry. Netflix started off as a content distribution company, soon they realized the value of their existing dataset in terms of customer taste. Netflix went on investing $100m into production of a TV show 'House of Cards' without even considering the pilot episode (Pilot episode is used to evaluate the performance of a new TV show in order to get go ahead with investment for full production). Netflix knew the show will be a hit even before the production started! In line with Netflix expectations, the show was well received by subscribers. About 10% subscribers began streaming the show on Day 1, and many of them ended up binge-watching. Thanks to Big Data analytics capabilities of Netflix that analysed billion hours of viewing patterns along with reviews and feedback of its 50 million subscriber base. The same technique is now used to invest in other TV shows/documentary productions. These shows being exclusive to Netflix, are fuelling up Netflix subscriptions, and also, they are making their presence felt at various award functions such as Emmy/Golden Globe Award/Oscar.

Let's take insurance sector now. If it's one of those days you think you need your car insurance more than ever may be due to bad weather or just trusting your intuitions, big data analytics can come to your rescue. Understanding the need of a customer segment, the vehicle insurance companies have come up exciting innovation of urge based insurance policies like 'Pay as you drive' and 'Pay how you drive'. The insurer will quote a premium considering various factors like driving location, traffic and weather condition, time of drive and data from on-board diagnostics. Such products are sometimes more economical than traditional products. These products are sure to gain further traction, once the cars become digitally connected.

The way forward

In above examples, we have seen organizations making bold move in order to leverage their big data capabilities. These are early days for big data, especially for big data analytics (BDA), and we have seen promise of what technology can help us achieve.  BDA in near future is sure to have a significant impact on business model of many companies. Also, we are certain to see innovative products and new revenue streams solely powered by analytics.

For any business or organization to invest in big data technology, they need to understand that BDA is not a crystal ball, at times it has its own limitation. Further, for any successful implementation of BDA project requires right blend of technology, machine learning expertise along with strong business acumen.

[1] http://hbr.org/2012/10/big-data-the-management-revolution/ar

October 27, 2014

OPC UA and High Speed Data streaming- Enabling Big Data Analytics in Oil and Gas Industry

Posted by Ketan Puri (View Profile | View All Posts) at 4:42 AM

Role of OPC UA in Oil and Gas Business

OPC primarily a Microsoft based solution for interfacing manufacturing and production systems. It was an abbreviation for "Object Linking and Embedding" (OLE) for Process Control (OPC). With time this concept has evolved into Open Platform Communication (OPC). Today it's more commonly referred as Open Productivity and Connectivity (OPC). The classic Microsoft version of OPC relies on COM and DCOM objects for interfacing the machine data.  The COM and DCOM has their own limitations mostly related to platform dependence, security, and data portability across firewalls and other non-Microsoft based applications.

OPC Foundation has come up with an architecture framework and standards to overcome the limitations of classic flavor of OPC. This new standard or framework is referred as OPC Unified Architecture (OPC UA). It provides standards for data extraction, security, and portability independent of any proprietary technology or platform. APIs are available in Java, .NET, and Ansi C. European energy companies are one of the early adopters of these standards and funding vendors to develop products around these standards. Vendors are already in market offering their product stacks around OPC data streaming, web services, visualization tools, and Real time data analytics. It is not be going to replace the existing OPC Classic implementations in a short term but can act as a mechanism of exposing this data to the enterprise in a much more effective fashion.

OPC Foundation certifies the products developed by these vendors and exposes APIs to manipulate the data. It provides a more standard mechanism of representing data types ranging from simple to more complex data structures. Models can be created to cater to different needs of the business while data is in motion (Real Time Data Analytical Models) or data at rest (Staged Data Analytical Models). Business rules can easily be configured to process the information in more time and cost effective manner. The various streaming data tools enable simultaneous streaming of information to multiple applications and data stores by use of high speed data transfer protocols. OPC UA provide a robust security model for data transfers. It also enables custom application development by leveraging the APIs. It will help the enterprises to get most of the value out the data.

Most of the Oil and Gas Exploration & Production companies relies on proprietary products that cover the deficiencies of OPC classic by creating their own wrapper to expose this data to the enterprises. Hence creating a strong dependency on these product vendors and their line of products to cater to the different business needs. It leads to a high licensing, and Infrastructure costs for upstream businesses. Due to proprietary nature of the data; data extraction, transformation and Integration add up to the cost for these enterprises. Not only there is a cost impact but also an impact to the business operations. By the time these enterprises gets a chance to even look into the data, they have already lost most of the value that this data has to offer. The operational real time risks that could have been avoided and can be converted into opportunities, get materialized.

From the performance perspective OPC DA is good for simple data types while OPC UA is designed for Complex data types which is more relevant for the upstream enterprises. The address space concept in OPC UA makes if more lucrative to the enterprise data management systems. OPC UA currently supports secure and firewall friendly high speed Binary TCP data transport and the web based protocols. With openness of these standards we can use custom protocols if we need to achieve higher data transfer speeds. There are various other protocols in market like FASP©, proprietary protocol developed by Aspera®, now an IBM® Company. FASP Byte Streaming APIs can eliminate the limitations of TCP in relation to data packet tracking and packet loss. It is independent of geographical distances and can transmit data at blazing speeds.

The Upstream, Midstream, and Downstream operations relies heavily on the PLCs, DCSs, PACs, data recorders, and control systems. These OPC enabled devices produces data every second. The data is managed using popular proprietary OPC Classic servers. Using OPC UA the data can be exposed and ported to the enterprises in a much cost effective and timely fashion.

The OPC UA has opened the doors for the enterprises to have a real time view of their operations and devise new business process models leveraging the benefits of Big Data Analytics, Real time predictive Analytics, and much more.

Benefits to Oil and Gas Operations:

Enterprises can benefit from the new open standards in many ways saving on cost and efficient operations ranging for Well Life Cycle Management, Distribution, Refining and Trading.

Areas of focus:

Action and Response Management

OPC UA provides context and content in real time to the enterprise. Alerts and Events generated by various SCADA systems can be made accessible to the system operator and enterprise at the same time. System operators need not rely just on their instincts and experience but has support from entire enterprise. Notifications can be sent to multiple stakeholders and efficient response strategies can be implemented for each event. It also enables the analyst to visually interpret the data on the web and mobile devices in order to respond to the incidents in real time. Data movement need not rely on proprietary systems to move it across multiple network layers. Custom visualization models give a different perspective of the data flowing out to the enterprise.

Decision Management

Making right decisions at right time can save millions for the enterprises. Decisions are based on the insights generated by the data across the enterprise. Most of this is the OPC data generated by the devices operating at remote locations. Faster we analyze data the better is the value we get from the insights. For example, exploration sensor data can guide upstream to decide on whether to proceed with drilling operation at a particular site, understand the geology of the site and procure right equipment's to execute drilling operations, deciding on the well trajectories to maximize the production, optimize the drilling parameters for safe and efficient drilling, optimize the refining operations based on hydrocarbon analysis of the oil, determining shortest route for transporting oil and gas, help in scheduling oil and gas operations, better decisions for executing large oil and gas trades.

Risk Management

Oil and gas industry is highly prone to risks, whether it's related to deep water drilling operations or transportation of oil and gas across the challenging terrains. A small incident can lead to loss of billions of dollars and on other hand it can open doors to tremendous opportunities. It's about understanding the risks, its consequences and leveraging right strategy to handle the risk. Most of the assets and equipment are OPC enabled and generate tons of data every second. The data if tapped at right time the organizations not only deal with any risk with confidence but also can exploit the opportunities. Analytical models can easily crunch the data at is most granular form leveraging OPC UA and provide ammunition to the enterprise to optimize their operations.

Health and Safety

The OPC UA data can be streamed directly from the drilling assets to the enterprise with an ability to perform in-flight analytics. The data can feed into analytical models designed to predict outcomes and prescribe actions to ensure safety of the operational landscape. Data is the new currency for the modern world and can provide insights to improve health and safety aspect of the oil and gas enterprise and meet the legal and regulatory needs of the region.

 

Future of Oil and Gas Enterprises

With the latest technological advancements, right investments, and capacity to accept change, it's not far when our oil and gas enterprises will step into the new era of intelligent field operations. As quoted by OPC Foundation that "There will be a need of only two living beings on the oil field, a Dog and a Human, where Human ensures dog gets his food on time and Dog ensures Human do not interferes with the field operations."

September 4, 2014

What do we mean by 'Realizing business value from big data'?

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 10:21 AM

Posting on behalf of Rajeev Nayar, Associate VP & Head, Big Data Practice

According to Gartner, about 64% of the companies have already invested in Big Data, and 56% have been struggling to derive value. This is something that we often hear when we talk about data, big data, analytics, reporting intelligence and so on and so forth. The question now is - 'Are enterprises really able to leverage the strength of data to start with and with deluge of data are enterprises able to leverage the strength of Big Data?'

When it comes to 'Realizing business value from Big Data' there seems to be a mismatch in how IT and business teams operate. Business teams wish to rapidly develop new insights and improve areas such as personalization, customer satisfaction; all irrespective of where the data lies. Technology teams traditionally are responsible in constructing a reporting mechanism to deliver these insights. This invariably has its own time lag from when the requirements were understood and to the time business team got the results. More often than not those results that they got were much later from when they actually needed it and it became irrelevant by the time the business teams actually got it. There is a significant mismatch in terms of how do we really look at this scenario of a structured IT construct and of the agility that the business needs and how we can help the enterprises leverage the strength of data in making right and timely decisions for their business expenses. Today, if we look at most enterprises, only about 20-25% of enterprise data is structured which has traditionally been used for analysis. 80% of data is unstructured. Only about 20% of data is traditionally used for analysis which is the structured construct.

What does an enterprise need to do when you look at this scenario?

First of all, the first big change is the scope of data itself. The scope with Big Data has changed from the traditional 20% structured construct that we have always been used to, to this large other 80% of unstructured data that has just become a part of the scope.

From the focus on data relationships, because structured data has always been based on the construct of established data relationships, the focus now shifts to understanding co-relations. The shift is happening from relying on data relationship which is known, to identifying correlations which are unknown in the past and that is really where the power of insights comes from.

And the third thing lies in identifying correlations moving away from data relationships, it's important to embed the element of discovery in the pattern of how you identify correlations. If you're not able to discover, you will minimize the element of identification of correlations.

Therefore the enterprises need to rethink their approach to Big Data first of all in changing scope of what data is meaningful to customers enterprise, secondly moving away from traditional data relationships to new correlations in business and to finally maximize correlations, enable technologies, enable frameworks that allows for self-discovery. And realizing business value lies in integrating these three pillars together.

Infosys Big Data services and solutions, have focused on helping enterprises to build a data ecosystem that empowers both technology and businesses to rapidly develop and action insights relevant to enterprises and customers business.

August 25, 2014

"Big Data is a fairly overhyped term. At the end of the day, analytics are about all forms of data." - Rajeev Nayar

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 10:00 AM

Rajeev Nayar, Associate VP & Head, Big Data Practice - Cloud, Infosys was featured in an exclusive interview with Analytics India Magazine(AIM). Rajeev spoke about the Big Data practice at Infosys, how we are seeing the market changing and how Infosys plans to sustain itself in the Big Data market.

 Some of the points Rajeev makes are:

·         Big Data is a fairly overhyped term. At the end of the day, analytics are about all forms of data. Big data just deals with data sets of extremely high volumes that cannot be handled by traditional tools

·         We strongly believe that there will ultimately be an amalgamation of Big Data with traditional BI/BA practices. What we are looking to achieve is to provide a common layer which brings together all data formats and help generate insights

·         Almost all organizations today are focused on using big data technologies rather than addressing the problems this technology is trying to solve. At the end of the day, it is important to understand that these are just tools

·         The largest concern for us lies in the acute shortage of skills in this space. Enterprise solutions are still not mature enough to be intuitive. Obtaining and retaining resources for big data projects is a big challenge

·         We see a lot of focus on how enterprises are going to analyze these vast dispersed sources of information without physically bringing them together

Click here to read the entire article, courtesy of AIM

June 9, 2014

Increasing Relevance of Self Service BI

Posted by Yogesh Bhatt (View Profile | View All Posts) at 5:31 AM

Rigid BI processes to get the data from source to end reporting business consumers via complex IT processes, thus creating dependency, latency and eventually business users losing the opportunity to discover, take decisions when it matters most. The CIO's are constantly struggling to have the right balance between the Business needs and IT Challenges. Good reason for increasing relevance of Self Service BI demand.

Continue reading "Increasing Relevance of Self Service BI" »

March 4, 2014

The Rising Bubble Theory of Big Data Analytics

Posted by Ketan Puri (View Profile | View All Posts) at 5:59 PM

The big data analytics has gained much importance in recent times. The concept of analyzing large data sets is not new. Astronomers in olden days used large observational data to predict the planetary movements. Even our forefathers used years of their experience for devising better ways of doing things. If we look through our history, evolution of modern medicine, advancement in space research, industrial revolution, and financial markets; data has played a key role. The only difference as compared to recent times is the speed by which the data got processed, stored, and analyzed.

With the availability of high computing and cheaper data storage resources, the time of processing the information has gone down drastically. What took years of experience and multitudes of human effort, now machines can do it in split of a second. Super computers are breaking the barriers of computing power day after day.  Classic example is of weather forecasting. Statistical modelling of data and using the computational power of modern machines, today we can predict weather with an hourly accuracy.

The concept of big data analytics has also spread in financial markets to predict the stock prices based on thousands of parameters. Financial models that can predict the economies of countries. We can find examples of big data analytics in any field of modern civilization. Whether its medicine, astronomy, finance, retail, robotics or any other science known to man, data has played a major role. It's not only the time aspect but the granularity of data that determines the richness of information it brings.

The Rising Bubble Theory of Big Data Analytics is a step towards understanding the data based on its movement through various layers of an enterprise. It is based on the analogy to the bubble generated at the base of an ocean and the journey it makes to reach the surface coalescing with other bubbles, disintegrating into multiple bubbles, or getting blocked by various obstructions in the turbulent waters. The data can take multiple paths based on varied applications in an enterprise. The granularity of data changes as it moves through the various layers of applications. The objective is to tap the data in its most granular form for minimizing the time for its analysis. The data undergoes losses due to filtering, standardization and Transformation process as it percolates through the different application layers. The time aspect refers to the transport mechanism or channels used to port data from its source to its destination. When we combine the analysis of data granularity and time aspects of it movement we can understand the value that it brings.

Data Value (dv)                  Granularity (g) /Time (t)

Data granularity can be associated to data depth linked to its data sources. Granularity of the data increases as we move closer to the data sources. At times due to complex nature of the proprietary data producers, it becomes difficult to analyze the data. The data need to be transformed into a more standard format before it can be interpreted into a meaningful information. Tapping this data as early in its journey can add great value for the business.

Data can move both horizontally or vertically. The horizontal movement involves data replication while vertical movement involves aggregation and further data synthesis.

 

Real-time vs. Non-Real-time Data Analytics and its relevance to Oil and Gas Industry

Posted by Ketan Puri (View Profile | View All Posts) at 4:19 PM

With the recent technological advancements, cheaper data storage options, higher processing power of modern machines, and availability of wide range of toolsets, the data analytics has gained much focus in Energy domain. Enterprises have started looking into newer ways to extract maximum value out of the massive amount of data they generate in their back yards. Unlike to the other domains (Retail, Finance, Healthcare), Energy companies are still struggling to unleash the full potential of Data Analytics. The reasons could be many but most common are:

·         High Capital Cost with low margins, limiting their investments

·         Dependency on legacy proprietary systems with limited or restricted access to the raw data in readable format

·         Limited network bandwidth at the exploration and production sites for data crunching and effective transmission

With advent of new standards like OPC UA, WITSML, PRODML, RESQML, evolution of network protocols, and powerful visualization tools, the barriers to Exploration and Production data analytics are breaking down. Oil and Gas companies have already started looking to reap the benefits of their massive data lying dormant in their data stores. Massive amounts of data is getting created every second. OPC data related to assets, remote devices, sensors; well core & seismic data, drill logs, production data etc. are some of common data categories in Exploration & Production (E&P) domain. The new data standards, readable formats (XML) has enabled these enterprises to interpret and transform this data into a more meaningful information in most cost effective manner. They only need to tap into this vast repository of data (real time or staged) by plugging in some of the leading data analytic tools available in the market. The data analytics tools has enabled these enterprises to define and implement new data models to cater to the needs of the business by customizing information for different stakeholders (Geoscientists, Geologists, System Operators, Trading departments, etc.).

Broadly, the Exploration and Production (E&P) data analytics can be classified into two categories:

1.       Real Time Data analytics

2.       Staged Data Analytics

 

RealtimeVsStagedDataAnalytics.png

Need of Real Time Data Analytics

Real time analytical solutions cater to the mission critical business needs like predicting the behavior of a device under specific set of conditions (Real Time Predictive Analytics) and determining the best suitable action strategy. It can help in detecting the thresholds levels of temperature and pressure for generators, compressors, or other devices and mitigating the impacts of fault conditions. Alerting based custom solutions can be built on top of the real time data analytical models. Most critical monitoring is done using proprietary tools like SCADA systems or other proprietary tools at onsite locations. It can become very challenging to provide large computing capacity, and skilled human resources at these remote and hazardous locations. Network bandwidth is also a limiting factor for transporting massive amounts of data to the enterprise data centers. Most of the information is limited to onsite system operators with limited toolsets. Enterprises gets a much delayed view of this data, creating too much dependency on system operators to manage the systems. Current approach to tackle the problems has become more reactive than proactive.

Real Time Data Analytics

Exploration and Production data streams can be tapped and mapped to the real time analytical models for in flight data analytics. These models can help the operators to formulate response strategies to mitigate the impact of fault conditions in a more effective way. System operators can focus more on their job rather than worrying about the logistics. They can have wider access to the enterprise knowledge base.

The data is streamed in real time to the enterprise data centers where live monitoring can be performed using more advanced computing techniques. Multiple data streams can be plugged together and analyzed in parallel. Data modelling techniques enable the enterprises to design cost effective data integration solutions. The advantages of real time analytics are huge. Implementations of fussy logic and neural networks, real time predictive analytics, and applications of advanced statistical methods are few to mention. It has opened the doors to limitless benefits for E&P organizations.

Staged Data Analytics

The data streamed from remote locations can be stored into high performance databases for advance staged data analytics. Complex statistical models and data analysis tools can work their magic on the staged data.  Staged data analytics is done on historical data sets to identify data patterns and design more effective business solutions. It also helps enterprises to improve the performance of the systems, identify gaps, optimize existing systems, and identify the need for new processes and lot more. Models can be created to simultaneously analyze massive amount of data from other data sources (related or unrelated) by use of leading industry analytical tools. Generally the E&P companies use these tools for reporting purposes to cater to the varied need of stakeholders across the enterprise. The full potential of staged data analytics is still to be explored in the Energy domain. It can bring benefits ranging from business process optimization, identifying process bottlenecks, more effective and safer operating conditions, forecasting outcomes using simulation techniques and so on.  It can create a totally new perspective of a business scenario.

December 3, 2013

Criticality of Predicting Customer Churn

Posted by Yogesh Bhatt (View Profile | View All Posts) at 6:36 AM

My personal experience of various chrun prediction solutions, and the way the problem is being looked at across various industries is what i am making an attempt to narrate in this blog post. One may actually read it as scratch notes gathered in the process of understanding Churn Prediction process and lifecycle. This is by no means a comprehensive and exhaustive list, however can be good check points if you are embarking on this road or not able to reap benefits with your investments in predicting churn.

Continue reading "Criticality of Predicting Customer Churn" »

October 10, 2013

"Hadoop is not for everything. Understand what it means to your organization within the context of your business goals" - Rajeev Nayar

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 6:55 AM

Rajeev Nayar, Associate VP & Head, Big Data Practice - Cloud, Infosys was featured in an exclusive column in the latest issue of Express Computer titled 'Big Data Hadoop'. In the column, Rajeev had listed out how enterprises should evaluate Hadoop with regard to big data projects. Some of the points are:

  • Understand what Hadoop means to your organization within the context of your business goals
  • If the adoption of Hadoop in the organization starts as an IT initiative, securing the initial funding to set up the Hadoop cluster, etc. is a challenge.
  • Breaking down the data silos in an organization is more difficult than it sounds.

Click here to read the entire issue, courtesy Express Computer

September 17, 2013

" Big Data - Most promising area from both a technology and business perspective ", Vishnu Bhat

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 12:31 PM

Vishnu Bhat, Senior Vice-President and Global Head, Cloud and Big Data, was a part of an expert panel brought together by TechGig.com to discuss talent required to drive the big data revolution in today's industry.  Vishnu Bhat said "I think Big Data is one of the most promising areas that we see today both from technology and business perspective. And as the opportunity out there is so large, I think we have barely scratched the surface. That is also the reason why it cannot go through the hype-cycle that so many other industries experienced."

Details of the panel discussion appeared on TechGig.com on 8th Sept, 2013 courtesy TechGig.com.

View image to read the entire feature.