March 4, 2014

The Rising Bubble Theory of Big Data Analytics

Posted by Ketan Puri (View Profile | View All Posts) at 5:59 PM

The big data analytics has gained much importance in recent times. The concept of analyzing large data sets is not new. Astronomers in olden days used large observational data to predict the planetary movements. Even our forefathers used years of their experience for devising better ways of doing things. If we look through our history, evolution of modern medicine, advancement in space research, industrial revolution, and financial markets; data has played a key role. The only difference as compared to recent times is the speed by which the data got processed, stored, and analyzed.

With the availability of high computing and cheaper data storage resources, the time of processing the information has gone down drastically. What took years of experience and multitudes of human effort, now machines can do it in split of a second. Super computers are breaking the barriers of computing power day after day.  Classic example is of weather forecasting. Statistical modelling of data and using the computational power of modern machines, today we can predict weather with an hourly accuracy.

The concept of big data analytics has also spread in financial markets to predict the stock prices based on thousands of parameters. Financial models that can predict the economies of countries. We can find examples of big data analytics in any field of modern civilization. Whether its medicine, astronomy, finance, retail, robotics or any other science known to man, data has played a major role. It's not only the time aspect but the granularity of data that determines the richness of information it brings.

The Rising Bubble Theory of Big Data Analytics is a step towards understanding the data based on its movement through various layers of an enterprise. It is based on the analogy to the bubble generated at the base of an ocean and the journey it makes to reach the surface coalescing with other bubbles, disintegrating into multiple bubbles, or getting blocked by various obstructions in the turbulent waters. The data can take multiple paths based on varied applications in an enterprise. The granularity of data changes as it moves through the various layers of applications. The objective is to tap the data in its most granular form for minimizing the time for its analysis. The data undergoes losses due to filtering, standardization and Transformation process as it percolates through the different application layers. The time aspect refers to the transport mechanism or channels used to port data from its source to its destination. When we combine the analysis of data granularity and time aspects of it movement we can understand the value that it brings.

Data Value (dv)                  Granularity (g) /Time (t)

Data granularity can be associated to data depth linked to its data sources. Granularity of the data increases as we move closer to the data sources. At times due to complex nature of the proprietary data producers, it becomes difficult to analyze the data. The data need to be transformed into a more standard format before it can be interpreted into a meaningful information. Tapping this data as early in its journey can add great value for the business.

Data can move both horizontally or vertically. The horizontal movement involves data replication while vertical movement involves aggregation and further data synthesis.

 

Real-time vs. Non-Real-time Data Analytics and its relevance to Oil and Gas Industry

Posted by Ketan Puri (View Profile | View All Posts) at 4:19 PM

With the recent technological advancements, cheaper data storage options, higher processing power of modern machines, and availability of wide range of toolsets, the data analytics has gained much focus in Energy domain. Enterprises have started looking into newer ways to extract maximum value out of the massive amount of data they generate in their back yards. Unlike to the other domains (Retail, Finance, Healthcare), Energy companies are still struggling to unleash the full potential of Data Analytics. The reasons could be many but most common are:

·         High Capital Cost with low margins, limiting their investments

·         Dependency on legacy proprietary systems with limited or restricted access to the raw data in readable format

·         Limited network bandwidth at the exploration and production sites for data crunching and effective transmission

With advent of new standards like OPC UA, WITSML, PRODML, RESQML, evolution of network protocols, and powerful visualization tools, the barriers to Exploration and Production data analytics are breaking down. Oil and Gas companies have already started looking to reap the benefits of their massive data lying dormant in their data stores. Massive amounts of data is getting created every second. OPC data related to assets, remote devices, sensors; well core & seismic data, drill logs, production data etc. are some of common data categories in Exploration & Production (E&P) domain. The new data standards, readable formats (XML) has enabled these enterprises to interpret and transform this data into a more meaningful information in most cost effective manner. They only need to tap into this vast repository of data (real time or staged) by plugging in some of the leading data analytic tools available in the market. The data analytics tools has enabled these enterprises to define and implement new data models to cater to the needs of the business by customizing information for different stakeholders (Geoscientists, Geologists, System Operators, Trading departments, etc.).

Broadly, the Exploration and Production (E&P) data analytics can be classified into two categories:

1.       Real Time Data analytics

2.       Staged Data Analytics

 

RealtimeVsStagedDataAnalytics.png

Need of Real Time Data Analytics

Real time analytical solutions cater to the mission critical business needs like predicting the behavior of a device under specific set of conditions (Real Time Predictive Analytics) and determining the best suitable action strategy. It can help in detecting the thresholds levels of temperature and pressure for generators, compressors, or other devices and mitigating the impacts of fault conditions. Alerting based custom solutions can be built on top of the real time data analytical models. Most critical monitoring is done using proprietary tools like SCADA systems or other proprietary tools at onsite locations. It can become very challenging to provide large computing capacity, and skilled human resources at these remote and hazardous locations. Network bandwidth is also a limiting factor for transporting massive amounts of data to the enterprise data centers. Most of the information is limited to onsite system operators with limited toolsets. Enterprises gets a much delayed view of this data, creating too much dependency on system operators to manage the systems. Current approach to tackle the problems has become more reactive than proactive.

Real Time Data Analytics

Exploration and Production data streams can be tapped and mapped to the real time analytical models for in flight data analytics. These models can help the operators to formulate response strategies to mitigate the impact of fault conditions in a more effective way. System operators can focus more on their job rather than worrying about the logistics. They can have wider access to the enterprise knowledge base.

The data is streamed in real time to the enterprise data centers where live monitoring can be performed using more advanced computing techniques. Multiple data streams can be plugged together and analyzed in parallel. Data modelling techniques enable the enterprises to design cost effective data integration solutions. The advantages of real time analytics are huge. Implementations of fussy logic and neural networks, real time predictive analytics, and applications of advanced statistical methods are few to mention. It has opened the doors to limitless benefits for E&P organizations.

Staged Data Analytics

The data streamed from remote locations can be stored into high performance databases for advance staged data analytics. Complex statistical models and data analysis tools can work their magic on the staged data.  Staged data analytics is done on historical data sets to identify data patterns and design more effective business solutions. It also helps enterprises to improve the performance of the systems, identify gaps, optimize existing systems, and identify the need for new processes and lot more. Models can be created to simultaneously analyze massive amount of data from other data sources (related or unrelated) by use of leading industry analytical tools. Generally the E&P companies use these tools for reporting purposes to cater to the varied need of stakeholders across the enterprise. The full potential of staged data analytics is still to be explored in the Energy domain. It can bring benefits ranging from business process optimization, identifying process bottlenecks, more effective and safer operating conditions, forecasting outcomes using simulation techniques and so on.  It can create a totally new perspective of a business scenario.

December 3, 2013

Criticality of Predicting Customer Churn

Posted by Yogesh Bhatt (View Profile | View All Posts) at 6:36 AM

My personal experience of various chrun prediction solutions, and the way the problem is being looked at across various industries is what i am making an attempt to narrate in this blog post. One may actually read it as scratch notes gathered in the process of understanding Churn Prediction process and lifecycle. This is by no means a comprehensive and exhaustive list, however can be good check points if you are embarking on this road or not able to reap benefits with your investments in predicting churn.

Continue reading "Criticality of Predicting Customer Churn" »

October 10, 2013

"Hadoop is not for everything. Understand what it means to your organization within the context of your business goals" - Rajeev Nayar

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 6:55 AM

Rajeev Nayar, Associate VP & Head, Big Data Practice - Cloud, Infosys was featured in an exclusive column in the latest issue of Express Computer titled 'Big Data Hadoop'. In the column, Rajeev had listed out how enterprises should evaluate Hadoop with regard to big data projects. Some of the points are:

  • Understand what Hadoop means to your organization within the context of your business goals
  • If the adoption of Hadoop in the organization starts as an IT initiative, securing the initial funding to set up the Hadoop cluster, etc. is a challenge.
  • Breaking down the data silos in an organization is more difficult than it sounds.

Click here to read the entire issue, courtesy Express Computer

September 17, 2013

" Big Data - Most promising area from both a technology and business perspective ", Vishnu Bhat

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 12:31 PM

Vishnu Bhat, Senior Vice-President and Global Head, Cloud and Big Data, was a part of an expert panel brought together by TechGig.com to discuss talent required to drive the big data revolution in today's industry.  Vishnu Bhat said "I think Big Data is one of the most promising areas that we see today both from technology and business perspective. And as the opportunity out there is so large, I think we have barely scratched the surface. That is also the reason why it cannot go through the hype-cycle that so many other industries experienced."

Details of the panel discussion appeared on TechGig.com on 8th Sept, 2013 courtesy TechGig.com.

View image to read the entire feature.

Taming the Elephant "10 Big Data trends for 2013" - Coverage in Hindu Business Line

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 11:42 AM

Earlier this year, Infosys released an infographic titled taming the Elephant - '10 Big Trends for 2013'. This infographic was covered in the Hindu Business Line's' Tech Bytes' column on July 19, 2013. A couple of the trends are as follows - 

The information strategy - still in formation - While enterprises will be the "inundata'd" with a hybrid big data dump, the need of the hour is to turn this into a flexible, manageable information ecosystem

Big Data - of the people, by the people, for the people - Cloud-based and open source tools will help democratize big data to take it out of the realm of expensive resources and high computing infrastructure - giving even smaller companies the ability to leverage big data for business insights


The infographic is courtesy Hindu Business Line

How to Sketch Cloud BI and Data Analytics

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 11:13 AM

This article appeared in the IT Next magazine and featured Sandeep Bhagat, Principal Architect, Big Data and Analytics, Infosys. In the article, Sandeep said, "Most BI systems hold confidential analytical data about enterprises and hence it is important to get buy-ins from business to host business sensitive data on the cloud environment".

Click here to read the entire article, courtesy IT Next

Apps to Profile Fraudsters based on Social Behavior

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 11:07 AM

"With the world going social and large amounts of operational data organizations collected over the last decade, data is getting BIG-petabyte big-and this tall order has assumed somewhat insurmountable proportions. For instance, bankers are looking at big data to tell them which customers are at the greatest risk for account take-over fraud. Automotive insurers are seeking to identify, with big data analysis, those exact customers who are considering a new car purchase with their next insurance renewal. And this has set off the trend to procure big data point solutions, including those on social computing networks-for each business function, and each process requirement." Said Vishnu Bhat, Vice-President and Global Head Cloud and Big Data in a feature that was published on Data Quest on 19th March 2013.

Click here to read the entire feature, courtesy Data Quest

"Mass scale enablement of our talent pool and investing on IP creation - Strategy for Big Data going forward" Rajeev Nayar

Posted by Nikhilesh Murthy (View Profile | View All Posts) at 11:01 AM


Rajeev Nayar, Associate Vice President and Head, Big Data Practice at Infosys was featured in an article "The Big Data opportunity for Indian IT service providers "on Information Week that featured on 27th August, 2013.  

Commenting on Infosys' big data strategy, Rajeev said "In Infosys, we started the Big Data journey back in early 2010 when the term Big Data was not even coined. We worked with some global companies in the industry to create their strategy around Big Data technologies to transform their IT and business at much lower cost at that point of time. At the same time, we also spotted the need and opportunity to create our own IP around the gaps in the Big Data technology space, which eventually we launched as our solution 'BigDataEdge' in February 2013. To keep up the momentum, our strategy going forward is to tap the Big Data Space in two ways -- mass scale enablement of our talent pool in Big Data and related technologies and churning out more IP"

Click here to read the complete article, courtesy Information Week

May 6, 2013

Social Media & Big Data - Declared preferences vs. Discovered preferences

Posted by Abhishek Singh (View Profile | View All Posts) at 8:02 AM

The other day, I logged into a local eCommerce site - FlipKart.com to buy a book. In India, it's a leader in eCommerce customer experience with very high Net Promoter Scores, a well-established metric with direct correlation to customer experience. I have not met anyone in India who has used this site and not recommended it to others. Like any forward-looking website, it allowed me to login using my Facebook/Gmail credentials, rather than asking me to create another username and password, which I would obviously forget. So my experience started on a positive note. Out of curiosity, I started to browse the "Recommendations For You" section to try and decode their algorithms. Next I tried the same process with Amazon, which did not allow me a Facebook login (or did I miss it?). And it dawned upon me that there are things that these e-tailers know about me because I told them that (the declared preferences) and there are things that they will infer about me based on my transactions with them (the discovered preferences). There would be two primary sources of declared preferences - my social media profile and any additions I make on their website to my profile like a phone number, an explicit addition to my "wish list" etc. And there would be another two primary sources on discovered preferences - my past transactions with them and my interactions with my social media website (including associated clickstreams). This is the perfect marriage of Social Media and Big Data.

Social Media

There are things that I do on my social media profile - let's take Facebook as an example - where I declare my preferences of music, my date-of-birth, my relationship status, my photos, etc. Some of this data is available to businesses, if I give a merchant access to my profile. These are explicitly declared by me and companies can use this data to substantially improve their interactions with me since they know that much more about me. So having separate logins - in my personal opinion - is useless . Whether you are a local e-Commerce site or even as complex as a bank, if you are not exploring ways to use social media logins (Facebook, Twitter, Gmail, LinkedIn etc.) to your site, you are really not sincere about knowing your customers and serving them optimally, no matter how much you harp upon "We live to serve" in your print or TV ads. Social media sites are called that because they help customers be social. And so should businesses - all of them, not just for lip service but for transaction-enablement.

Big Data

Next comes all the status updates, 'Likes', comments, check-ins that I do on Facebook, activities which reveal a little bit about myself every time I interact with these sites. Facebook graph search and Facebook Home, of course, have now opened an even bigger Pandora's Box in my opinion. Add these to the customer transactions with your company, the clickstreams of the customer on your website, install the processing power to do statistical analysis around the combination of all this structured and unstructured data and you are well on your way to your big data analytics strategy. But how much Big Data is useful and how is it useful? What can companies do better with Big Data Analytics that they could not before?

Analytics

Part of the answer is in the problem itself - how intrinsically predictable something might be?

(And as far as human predictability is concerned, just think about your spouses, kids and parents before answering. That should give you an idea of how predictable your customer is going to be.)

So what's the point?

How can you create value for me - your customer? Big Data, Social Media, Predictive Analytics, any technology on which a company invests money, it must create value both for the company and its customers. And ideally it must achieve this in a way whereby it improves the quality of the overall customer experience and reduces the cost of operations for the business. So how can you create value for your customer and yourselves?

Firstly the businesses must have end objectives in mind - not something as broad as "revenue increase by 2%" but something more well-defined "revenue increase by 2% from existing customers through existing products". The key word(s) here is "existing". If you are talking customer acquisition or new product launches, you might need different approaches than what I am about to talk next. The intrinsic predictability of your problem is vital to finding a solution for that problem.

When you have defined the problem as clearly possible with potential for predictability - you start looking for points of commonality between your product line and your existing customers. Now all of a sudden it begins to make sense to learn more about your existing customers and how they consume your existing products. To study that you start mining the Big Data of your company's enterprise datawarehouses, transactional systems and of course the social media profiles of your customers. And you can start the journey to discover preferences of your customers in levels of granularity that makes it meaningful to establish relationship between customer segments and product segments with higher degree of correlation. An improved matching of product profile with customer profile since you now have more data points - both about the customer and the product. So both the declared preferences and discovered preferences are valuable

Discovering these preferences and patterns of your customers and products is meaningful only if you are confident that you are leaving that 2% money on the table. A product like a book in unlikely to be bought by the same customer again but a perishable or fast moving consumer good (FMCG) would have a typical consumption period after which it can be recommended yet again. So if you know when the customer last bought it, you can recommend again after a certain period.

Finally, as usual I want to state that business is always about ROI and if you have an opportunity to invest your money in something that can give substantially higher returns with the same investment, then forget big data, forget social media... But till then happy analysing social media and happy discovering customer preferences... and to start with let your customers become your friends on Facebook and login to your website with their Facebook credentials...