Realize business value from big data with Infosys data analytics solutions.

Results tagged “Big Data”

Pragmatic Data Quality Approach for a Data Lake

On 26th Oct 2016, we have presented our thought paper at the PPDM conference hosted in Calgary Telus Spark science center(Calgary Data Management Symposium, Tradeshow & AGM)

http://dl.ppdm.org/dl/1830

Abstract:

With the increase in amount of data produced from sensors, devices, interactions and transactions,
ensuring ongoing data quality is a significant task and concern for most E&P companies. As a result, most of the systems that are sources of data have deferred the task of data clean-up and quality improvement to the point of usage. Within the Big Data world, the concept of Data Lake which allows ingesting all type of data from source systems without worrying about the type or quality of data, further complicates the aspect of data quality as the data structure and usage is left to the consumer. Without a consistent governance framework and set of common rules for data quality, Data Lake may quickly end up into a Data Swamp. This paper examines the important aspects of data quality within Upstream Big Data context, and proposes a balanced approach for data quality assurance across data ingestion and data usage, to improve data confidence and readiness for downstream analytical efforts.
 

The key points/messages that we presented were,

1. Data quality is NOT about transforming or cleansing the data to fit into the perspectives...instead  it's about putting right perspective to the data....


2. Data by itself is not Good or Bad it's just data, pure in its most granular form


3. Quality is determined by the perspective through which we look at the same data


4. Architectural approach to abstract data from the perspectives or standards and build a layer of semantics to view the same data from different point of views. We don't need to populate data into models (PPDM, PODs etc.) instead we put models on top of the existing data promoting the paradigm of "ME and WE" where each consumer of the data has their view point of the same data. The concept of the WELL can be viewed in reference to Completion, Production, Exploration etc. without duplicating the data in the data lake.


5. Deliver quick value to the business and build their trust on the data in the data lake scenario


Please refer to the below link for the details

http://dl.ppdm.org/dl/1830

New Generation of Data Analytics and New Generation of Opportunities

 

As someone once said "Intelligence is nothing but amount of meaningful information harvested from the data available".  So it's the data and the skill/talent/pattern to extract the meaningful information that defines the intelligence, knowledge and statistical predictions.

Traditionally analytics were driven by deriving trends and patterns from the historically accumulated data, these data normally included conventional data that were logged or recorded as they occurred and were done for this specific purpose. With the advancement of technology and its penetration into different turf things have changed, today application of technology in different space is in a way digitizing the offerings where the actual primary outcome of the technology inadvertently serves as a data for breakthrough analytics that could be derived by connecting different dots. The dots that are meaningless independently but when connected gives you the "Big Picture". Here in this blog I will cite some example where information is extracted from an unconventional and innovative way that otherwise were not obvious.

There is a saying that "A picture is worth a thousand words" and for Data Analytics what this means is that every picture has a treasure of information waiting to be harvested. Traditionally it was difficult to deal with images and extract meaningful information out of it as you would have done with character data's but the advancement of technology is breaking this barrier. Google Goggles was among one of the earlier efforts that ventured into this field of unlocking this data that hid within images and today their latest mobile app Google Photos leverages this capability of searching image for their contents, through Image processing and geotagging information.  This pattern of extracting information from image has today paved way to cutting edge analytics where Satellite images of parking lots of big retailers like Walmart, Home Depot and others are used to predict their quarterly earnings thus giving research analyst actual raw data that could be trusted. Similar pattern on a bigger scale is helping predict global economy where satellite images of Oil Storage, Movement of Trucks in Mines, Agriculture fields and Night lights are reflecting the current status of world economy as they shape, this all has been possible by having real times images transformed into precious data.

Showing real time traffic has been one of the common features of GPS devices and apps used today, initially this information were sourced through traffic sensors placed by government authorities and other concerned departments as a result only few roads had this privileges for their traffic data to be analyzed, all this changed with company like Google going the crowdsourcing way, with crowdsourcing google was able to improve the reach and accuracy of its traffic information and predictions, when users use google maps for navigation on their phone, the phone sends back the data anonymously which help the company determine how fast the cars are moving in any specific roads, with acquisition of Waze Google was able to add human touch to this algorithm as drivers could provide real-time feedback of their driving experience on specific routes. So does this stop here? Nope, with more and more users using google maps for navigation, Google has access to vast amount data pertaining to end users driving habits - How much the user drives, where does he drive, what time he drives, the speed, the critical points the user navigates and much more, with this immense amount of data company like Google is well positioned to provide Auto insurance based on unique data analytics that could rival traditional actuarial methodology.

To end this post I would like to cite a personal example and opportunity that I have been lucky to come across. Being tech freak, last year, I bought a car adapter called Automatic it comes with a companion mobile app and helps your car to be connected, it tracks your driving behavior, miles driven fuel consumed and routes taken, basically a handy device if you are interested in analyzing your driving habits at leisure. As the new year 2016 arrived, I got a report from Automatic that had some key aspects based on my driving habits for the gone year like, average fuel consumed, States travelled e.t.c, but one of the fact reported was interesting, the report mentioned that I left my home to work 16 minutes earlier than an average Automatic user did and I arrived home 44 minutes later than the average Automatic user did (huh.. me working hard J), Automatic had enough data and pattern to determine my office commute, now when I think about this, there is a huge potential of connecting the other dots and harvesting some incredible facts and opportunities. For example based on my parking location my place or organization where I work could be determined or I could be given an incentive to let reveal where I work and in turn would get insights to how my fellow workers (of course anonymous details) fared with their working hours. This collective data can give out quite some interesting facts about the work place culture like, average hours an employee works,  average data of employee's commuting hours and timings in a day can determine how flexible the organization working hours were, average regular data of employee commuting to work place can determines organizations policies and support for Work From Home, thus the data which are logged for certain aspect can be traversed to different context and augmented with other details to gain wealth of information, which in turn can turned into tremendous opportunities. Imagine company like Glassdoor taping into such data by forming an alliance, it could be a goldmine of actual raw data to augment their current analytics.

This is just some of the examples how today data no longer come from traditional intended sources but are harvested from wide ranges of technology offerings and then augmented and enriched with other factors to give some incredible depths and insights. Today with analytics, anymore it's not the sky but the Imagination that's your limit, hence it's time to get Datatized!!


The brave new world of Big Data Analytics

We all have heard about Big Data and much more about the hype surrounding it. Is it worth the excitement? Can Big Data live-up to its expectations or is it just another technology fad?

Every day 2.5 exabytes (2.5 billion GBs) of data is generated, and 90% of current data has been generated as recently as in last two years. [1] Data is not just growing, but it's exploding exponentially. We have witnessed data explosion trend over the past few years. The trend is certain to continue as the world embraces digitalization further. Digitalization has allowed organizations to capture/store more data and details, which was previously either infeasible or expensive.

Big Data is not just about storing all social media activities, clickstreams, public web, transaction and application data, machine logs, etc., but it has more to do with insights generation using collated data.

Big Data Analytics at work

We will discuss three examples which will help to understand the potential of this field.

During FIFA 2014 World Cup, Microsoft's Bing prediction engine predicted winner of each match even before the match started! The prediction engine was so successful that it was able to point out winner for each elimination match right till the finale, that's an impeccable record of 15 matches in a row! The prediction algorithm used various types of data fields like winning/losing margins, offensive and defensive stats, team composition, player position, game timing, playing condition like weather, and venue distance from playing country. If these exclusive insights would have been made available to one particular team, their FIFA journey would have been far more focused and successful. In near future, don't be surprised to see brands partnering up with Bing (or any other for that matter) to predict the winning team in order to make their sponsorship decisions.

Let's take a look at another example, this one is from the media and entertainment industry. Netflix started off as a content distribution company, soon they realized the value of their existing dataset in terms of customer taste. Netflix went on investing $100m into production of a TV show 'House of Cards' without even considering the pilot episode (Pilot episode is used to evaluate the performance of a new TV show in order to get go ahead with investment for full production). Netflix knew the show will be a hit even before the production started! In line with Netflix expectations, the show was well received by subscribers. About 10% subscribers began streaming the show on Day 1, and many of them ended up binge-watching. Thanks to Big Data analytics capabilities of Netflix that analysed billion hours of viewing patterns along with reviews and feedback of its 50 million subscriber base. The same technique is now used to invest in other TV shows/documentary productions. These shows being exclusive to Netflix, are fuelling up Netflix subscriptions, and also, they are making their presence felt at various award functions such as Emmy/Golden Globe Award/Oscar.

Let's take insurance sector now. If it's one of those days you think you need your car insurance more than ever may be due to bad weather or just trusting your intuitions, big data analytics can come to your rescue. Understanding the need of a customer segment, the vehicle insurance companies have come up exciting innovation of urge based insurance policies like 'Pay as you drive' and 'Pay how you drive'. The insurer will quote a premium considering various factors like driving location, traffic and weather condition, time of drive and data from on-board diagnostics. Such products are sometimes more economical than traditional products. These products are sure to gain further traction, once the cars become digitally connected.

The way forward

In above examples, we have seen organizations making bold move in order to leverage their big data capabilities. These are early days for big data, especially for big data analytics (BDA), and we have seen promise of what technology can help us achieve.  BDA in near future is sure to have a significant impact on business model of many companies. Also, we are certain to see innovative products and new revenue streams solely powered by analytics.

For any business or organization to invest in big data technology, they need to understand that BDA is not a crystal ball, at times it has its own limitation. Further, for any successful implementation of BDA project requires right blend of technology, machine learning expertise along with strong business acumen.

[1] http://hbr.org/2012/10/big-data-the-management-revolution/ar

Role of OPC UA in Oil and Gas Business

OPC primarily a Microsoft based solution for interfacing manufacturing and production systems. It was an abbreviation for "Object Linking and Embedding" (OLE) for Process Control (OPC). With time this concept has evolved into Open Platform Communication (OPC). Today it's more commonly referred as Open Productivity and Connectivity (OPC). The classic Microsoft version of OPC relies on COM and DCOM objects for interfacing the machine data.  The COM and DCOM has their own limitations mostly related to platform dependence, security, and data portability across firewalls and other non-Microsoft based applications.

OPC Foundation has come up with an architecture framework and standards to overcome the limitations of classic flavor of OPC. This new standard or framework is referred as OPC Unified Architecture (OPC UA). It provides standards for data extraction, security, and portability independent of any proprietary technology or platform. APIs are available in Java, .NET, and Ansi C. European energy companies are one of the early adopters of these standards and funding vendors to develop products around these standards. Vendors are already in market offering their product stacks around OPC data streaming, web services, visualization tools, and Real time data analytics. It is not be going to replace the existing OPC Classic implementations in a short term but can act as a mechanism of exposing this data to the enterprise in a much more effective fashion.

OPC Foundation certifies the products developed by these vendors and exposes APIs to manipulate the data. It provides a more standard mechanism of representing data types ranging from simple to more complex data structures. Models can be created to cater to different needs of the business while data is in motion (Real Time Data Analytical Models) or data at rest (Staged Data Analytical Models). Business rules can easily be configured to process the information in more time and cost effective manner. The various streaming data tools enable simultaneous streaming of information to multiple applications and data stores by use of high speed data transfer protocols. OPC UA provide a robust security model for data transfers. It also enables custom application development by leveraging the APIs. It will help the enterprises to get most of the value out the data.

Most of the Oil and Gas Exploration & Production companies relies on proprietary products that cover the deficiencies of OPC classic by creating their own wrapper to expose this data to the enterprises. Hence creating a strong dependency on these product vendors and their line of products to cater to the different business needs. It leads to a high licensing, and Infrastructure costs for upstream businesses. Due to proprietary nature of the data; data extraction, transformation and Integration add up to the cost for these enterprises. Not only there is a cost impact but also an impact to the business operations. By the time these enterprises gets a chance to even look into the data, they have already lost most of the value that this data has to offer. The operational real time risks that could have been avoided and can be converted into opportunities, get materialized.

From the performance perspective OPC DA is good for simple data types while OPC UA is designed for Complex data types which is more relevant for the upstream enterprises. The address space concept in OPC UA makes if more lucrative to the enterprise data management systems. OPC UA currently supports secure and firewall friendly high speed Binary TCP data transport and the web based protocols. With openness of these standards we can use custom protocols if we need to achieve higher data transfer speeds. There are various other protocols in market like FASP©, proprietary protocol developed by Aspera®, now an IBM® Company. FASP Byte Streaming APIs can eliminate the limitations of TCP in relation to data packet tracking and packet loss. It is independent of geographical distances and can transmit data at blazing speeds.

The Upstream, Midstream, and Downstream operations relies heavily on the PLCs, DCSs, PACs, data recorders, and control systems. These OPC enabled devices produces data every second. The data is managed using popular proprietary OPC Classic servers. Using OPC UA the data can be exposed and ported to the enterprises in a much cost effective and timely fashion.

The OPC UA has opened the doors for the enterprises to have a real time view of their operations and devise new business process models leveraging the benefits of Big Data Analytics, Real time predictive Analytics, and much more.

Benefits to Oil and Gas Operations:

Enterprises can benefit from the new open standards in many ways saving on cost and efficient operations ranging for Well Life Cycle Management, Distribution, Refining and Trading.

Areas of focus:

Action and Response Management

OPC UA provides context and content in real time to the enterprise. Alerts and Events generated by various SCADA systems can be made accessible to the system operator and enterprise at the same time. System operators need not rely just on their instincts and experience but has support from entire enterprise. Notifications can be sent to multiple stakeholders and efficient response strategies can be implemented for each event. It also enables the analyst to visually interpret the data on the web and mobile devices in order to respond to the incidents in real time. Data movement need not rely on proprietary systems to move it across multiple network layers. Custom visualization models give a different perspective of the data flowing out to the enterprise.

Decision Management

Making right decisions at right time can save millions for the enterprises. Decisions are based on the insights generated by the data across the enterprise. Most of this is the OPC data generated by the devices operating at remote locations. Faster we analyze data the better is the value we get from the insights. For example, exploration sensor data can guide upstream to decide on whether to proceed with drilling operation at a particular site, understand the geology of the site and procure right equipment's to execute drilling operations, deciding on the well trajectories to maximize the production, optimize the drilling parameters for safe and efficient drilling, optimize the refining operations based on hydrocarbon analysis of the oil, determining shortest route for transporting oil and gas, help in scheduling oil and gas operations, better decisions for executing large oil and gas trades.

Risk Management

Oil and gas industry is highly prone to risks, whether it's related to deep water drilling operations or transportation of oil and gas across the challenging terrains. A small incident can lead to loss of billions of dollars and on other hand it can open doors to tremendous opportunities. It's about understanding the risks, its consequences and leveraging right strategy to handle the risk. Most of the assets and equipment are OPC enabled and generate tons of data every second. The data if tapped at right time the organizations not only deal with any risk with confidence but also can exploit the opportunities. Analytical models can easily crunch the data at is most granular form leveraging OPC UA and provide ammunition to the enterprise to optimize their operations.

Health and Safety

The OPC UA data can be streamed directly from the drilling assets to the enterprise with an ability to perform in-flight analytics. The data can feed into analytical models designed to predict outcomes and prescribe actions to ensure safety of the operational landscape. Data is the new currency for the modern world and can provide insights to improve health and safety aspect of the oil and gas enterprise and meet the legal and regulatory needs of the region.

 

Future of Oil and Gas Enterprises

With the latest technological advancements, right investments, and capacity to accept change, it's not far when our oil and gas enterprises will step into the new era of intelligent field operations. As quoted by OPC Foundation that "There will be a need of only two living beings on the oil field, a Dog and a Human, where Human ensures dog gets his food on time and Dog ensures Human do not interferes with the field operations."

What do we mean by 'Realizing business value from big data'?

Posting on behalf of Rajeev Nayar, Associate VP & Head, Big Data Practice

According to Gartner, about 64% of the companies have already invested in Big Data, and 56% have been struggling to derive value. This is something that we often hear when we talk about data, big data, analytics, reporting intelligence and so on and so forth. The question now is - 'Are enterprises really able to leverage the strength of data to start with and with deluge of data are enterprises able to leverage the strength of Big Data?'

When it comes to 'Realizing business value from Big Data' there seems to be a mismatch in how IT and business teams operate. Business teams wish to rapidly develop new insights and improve areas such as personalization, customer satisfaction; all irrespective of where the data lies. Technology teams traditionally are responsible in constructing a reporting mechanism to deliver these insights. This invariably has its own time lag from when the requirements were understood and to the time business team got the results. More often than not those results that they got were much later from when they actually needed it and it became irrelevant by the time the business teams actually got it. There is a significant mismatch in terms of how do we really look at this scenario of a structured IT construct and of the agility that the business needs and how we can help the enterprises leverage the strength of data in making right and timely decisions for their business expenses. Today, if we look at most enterprises, only about 20-25% of enterprise data is structured which has traditionally been used for analysis. 80% of data is unstructured. Only about 20% of data is traditionally used for analysis which is the structured construct.

What does an enterprise need to do when you look at this scenario?

First of all, the first big change is the scope of data itself. The scope with Big Data has changed from the traditional 20% structured construct that we have always been used to, to this large other 80% of unstructured data that has just become a part of the scope.

From the focus on data relationships, because structured data has always been based on the construct of established data relationships, the focus now shifts to understanding co-relations. The shift is happening from relying on data relationship which is known, to identifying correlations which are unknown in the past and that is really where the power of insights comes from.

And the third thing lies in identifying correlations moving away from data relationships, it's important to embed the element of discovery in the pattern of how you identify correlations. If you're not able to discover, you will minimize the element of identification of correlations.

Therefore the enterprises need to rethink their approach to Big Data first of all in changing scope of what data is meaningful to customers enterprise, secondly moving away from traditional data relationships to new correlations in business and to finally maximize correlations, enable technologies, enable frameworks that allows for self-discovery. And realizing business value lies in integrating these three pillars together.

Infosys Big Data services and solutions, have focused on helping enterprises to build a data ecosystem that empowers both technology and businesses to rapidly develop and action insights relevant to enterprises and customers business.

Rajeev Nayar, Associate VP & Head, Big Data Practice - Cloud, Infosys was featured in an exclusive interview with Analytics India Magazine(AIM). Rajeev spoke about the Big Data practice at Infosys, how we are seeing the market changing and how Infosys plans to sustain itself in the Big Data market.

 Some of the points Rajeev makes are:

·         Big Data is a fairly overhyped term. At the end of the day, analytics are about all forms of data. Big data just deals with data sets of extremely high volumes that cannot be handled by traditional tools

·         We strongly believe that there will ultimately be an amalgamation of Big Data with traditional BI/BA practices. What we are looking to achieve is to provide a common layer which brings together all data formats and help generate insights

·         Almost all organizations today are focused on using big data technologies rather than addressing the problems this technology is trying to solve. At the end of the day, it is important to understand that these are just tools

·         The largest concern for us lies in the acute shortage of skills in this space. Enterprise solutions are still not mature enough to be intuitive. Obtaining and retaining resources for big data projects is a big challenge

·         We see a lot of focus on how enterprises are going to analyze these vast dispersed sources of information without physically bringing them together

Click here to read the entire article, courtesy of AIM

Rajeev Nayar, Associate VP & Head, Big Data Practice - Cloud, Infosys was featured in an exclusive column in the latest issue of Express Computer titled 'Big Data Hadoop'. In the column, Rajeev had listed out how enterprises should evaluate Hadoop with regard to big data projects. Some of the points are:

  • Understand what Hadoop means to your organization within the context of your business goals
  • If the adoption of Hadoop in the organization starts as an IT initiative, securing the initial funding to set up the Hadoop cluster, etc. is a challenge.
  • Breaking down the data silos in an organization is more difficult than it sounds.

Click here to read the entire issue, courtesy Express Computer


Rajeev Nayar, Associate Vice President and Head, Big Data Practice at Infosys was featured in an article "The Big Data opportunity for Indian IT service providers "on Information Week that featured on 27th August, 2013.  

Commenting on Infosys' big data strategy, Rajeev said "In Infosys, we started the Big Data journey back in early 2010 when the term Big Data was not even coined. We worked with some global companies in the industry to create their strategy around Big Data technologies to transform their IT and business at much lower cost at that point of time. At the same time, we also spotted the need and opportunity to create our own IP around the gaps in the Big Data technology space, which eventually we launched as our solution 'BigDataEdge' in February 2013. To keep up the momentum, our strategy going forward is to tap the Big Data Space in two ways -- mass scale enablement of our talent pool in Big Data and related technologies and churning out more IP"

Click here to read the complete article, courtesy Information Week

The other day, I logged into a local eCommerce site - FlipKart.com to buy a book. In India, it's a leader in eCommerce customer experience with very high Net Promoter Scores, a well-established metric with direct correlation to customer experience. I have not met anyone in India who has used this site and not recommended it to others. Like any forward-looking website, it allowed me to login using my Facebook/Gmail credentials, rather than asking me to create another username and password, which I would obviously forget. So my experience started on a positive note. Out of curiosity, I started to browse the "Recommendations For You" section to try and decode their algorithms. Next I tried the same process with Amazon, which did not allow me a Facebook login (or did I miss it?). And it dawned upon me that there are things that these e-tailers know about me because I told them that (the declared preferences) and there are things that they will infer about me based on my transactions with them (the discovered preferences). There would be two primary sources of declared preferences - my social media profile and any additions I make on their website to my profile like a phone number, an explicit addition to my "wish list" etc. And there would be another two primary sources on discovered preferences - my past transactions with them and my interactions with my social media website (including associated clickstreams). This is the perfect marriage of Social Media and Big Data.

Social Media

There are things that I do on my social media profile - let's take Facebook as an example - where I declare my preferences of music, my date-of-birth, my relationship status, my photos, etc. Some of this data is available to businesses, if I give a merchant access to my profile. These are explicitly declared by me and companies can use this data to substantially improve their interactions with me since they know that much more about me. So having separate logins - in my personal opinion - is useless . Whether you are a local e-Commerce site or even as complex as a bank, if you are not exploring ways to use social media logins (Facebook, Twitter, Gmail, LinkedIn etc.) to your site, you are really not sincere about knowing your customers and serving them optimally, no matter how much you harp upon "We live to serve" in your print or TV ads. Social media sites are called that because they help customers be social. And so should businesses - all of them, not just for lip service but for transaction-enablement.

Big Data

Next comes all the status updates, 'Likes', comments, check-ins that I do on Facebook, activities which reveal a little bit about myself every time I interact with these sites. Facebook graph search and Facebook Home, of course, have now opened an even bigger Pandora's Box in my opinion. Add these to the customer transactions with your company, the clickstreams of the customer on your website, install the processing power to do statistical analysis around the combination of all this structured and unstructured data and you are well on your way to your big data analytics strategy. But how much Big Data is useful and how is it useful? What can companies do better with Big Data Analytics that they could not before?

Analytics

Part of the answer is in the problem itself - how intrinsically predictable something might be?

(And as far as human predictability is concerned, just think about your spouses, kids and parents before answering. That should give you an idea of how predictable your customer is going to be.)

So what's the point?

How can you create value for me - your customer? Big Data, Social Media, Predictive Analytics, any technology on which a company invests money, it must create value both for the company and its customers. And ideally it must achieve this in a way whereby it improves the quality of the overall customer experience and reduces the cost of operations for the business. So how can you create value for your customer and yourselves?

Firstly the businesses must have end objectives in mind - not something as broad as "revenue increase by 2%" but something more well-defined "revenue increase by 2% from existing customers through existing products". The key word(s) here is "existing". If you are talking customer acquisition or new product launches, you might need different approaches than what I am about to talk next. The intrinsic predictability of your problem is vital to finding a solution for that problem.

When you have defined the problem as clearly possible with potential for predictability - you start looking for points of commonality between your product line and your existing customers. Now all of a sudden it begins to make sense to learn more about your existing customers and how they consume your existing products. To study that you start mining the Big Data of your company's enterprise datawarehouses, transactional systems and of course the social media profiles of your customers. And you can start the journey to discover preferences of your customers in levels of granularity that makes it meaningful to establish relationship between customer segments and product segments with higher degree of correlation. An improved matching of product profile with customer profile since you now have more data points - both about the customer and the product. So both the declared preferences and discovered preferences are valuable

Discovering these preferences and patterns of your customers and products is meaningful only if you are confident that you are leaving that 2% money on the table. A product like a book in unlikely to be bought by the same customer again but a perishable or fast moving consumer good (FMCG) would have a typical consumption period after which it can be recommended yet again. So if you know when the customer last bought it, you can recommend again after a certain period.

Finally, as usual I want to state that business is always about ROI and if you have an opportunity to invest your money in something that can give substantially higher returns with the same investment, then forget big data, forget social media... But till then happy analysing social media and happy discovering customer preferences... and to start with let your customers become your friends on Facebook and login to your website with their Facebook credentials...

Realizing Business Value from Big Data

We are at an inflection point in information management. Enterprises, till very recently, were pretty much entirely fueled by structured data - for processing, analysis and insight generation. The promise of Big Data has of course loomed large for some time now, but with emphasis mostly on socially generated unstructured data in the Internet - whether for brand assessment, consumer sentiment analysis or simply tracking recall value.

There is something amiss here. The fact is that enterprises have completely ignored the vast reserves of unstructured data tucked away within the walls of their corporations - accounting for over 80% of organizational data, including documents, data on tapes, and such like. This is the dark secret in most enterprises.

Business organizations have always hungered for data that can provide action-driving insights. Yet, they've seldom got something easily and fast. The wait for meaningful data has always been excruciatingly long. And, when they do manage to get to the data or insight, more often than not, the requirement is no longer pressing, and the opportunity gone cold.

As I mentioned in my previous post, Big Data must create value - real and tangible economic value for it to be meaningful. And it must do more than just what a traditional Datawarehouse could achieve. Taking structured transactional data, putting it in a Datawarehouse and mining it for statistical analysis to obtain customer insights is nothing new or revolutionary for all of us to spend so much time talking about. The maximum amount of data (and one with largest growth rate) that is being generated nowadays is unstructured and with cameras in every phone, a lot of this data consists of multimedia like video and pictures. Tagging these videos with metadata and annotations tries to put some structure around these large files. For example, when you watch a video on Youtube, how does it know what else to "recommend", what to "feature" and what to "suggest"? Youtube's search ranking algorithm tries to constantly stay ahead of the game by delivering the most relevant and engaging content, so as to optimise the return-on-investment for its advertisers. Similarly, TripAdvisor's ability to put structure around the large volume of its unstructured data (namely hotel reviews, pictures, ratings, etc.) is proving to be financially successful for both itself and its partners. One reality of all this large volume of data being created rapidly is how do you stay ahead of the game and continuously innovate to make more money than your competitors?

 

TripAdvisor.png

 

 

 

 

 

For purposes of illustration, we can look at Hotel industry, its adoption of Sentiment Analysis and how it has used the technology to positively influence its room pricing abilities. Hotel industry is abuzz with something called Online Reputation Management. Companies like Radian6 (SalesForce Subsidiary in Canada), ReviewPro (Spain), Brand Karma (US) Hootsuite (US), ViralHeat (US) and SocialNuggests (US), ClaraBridge, SentiRate and TrustYou(with US based subsidiary called Review Analyst) along with the field's largest player TripAdvisor are all trying to tame the large volume of data consisting of hotel reviews in meaningful manner. Additionally, it is creating substantial all-round economic value. For hotels it's creating the ability for them to charge higher premiums compared to their competitors. For customers it is giving them the ability to get maximum value for their money. For technology providers, it is creating ability to offer a host of new age services both to the businesses like hotels as well as directly to customers. Larger technology players like Tripadvisor are minting money multiple ways. From offering value added services to customers by becoming the default site for hotel value comparisons (both price and reputation comparisons), they are able to sell digital real estate to the highest bidder for ad space as well as charging commissions to hotels for bookings generated from their website. In the end both the customer and the hotels get exactly what they are looking for -increased value for their money....

Monetization of Big Data is one of biggest challenge that the technology players are constantly working to solve and when done right, it unlocks tremendous economic potential for everyone...

Next imagine a world of video reviews (like Youtube) and picture reviews (hint: Pinterest) all getting organised and sorted to provide meaningful analysis for customers shopping the market for their ideal honeymoon destination or their dream vacation or just a weekend getaway... Tremendous opportunity with lots of money for everyone to make...

Big Data Enabled Enterprise

Larger enterprises have extremely high volumes of data, coming in at a rapid pace from a wide variety of sources. Without a proper Big Data solution, finding relevant relationships is like fishing in the dark. For chief information officers, priority is to enable their businesses to make better decisions faster.

Infosys_Big_Data_info.png

10

Big Data: Custom Analytics Helps Gain Competitive Insight

In my last blog, we briefly touched upon how critical it is for enterprises to identify the right data sources for their big data strategy. The next step is to break down the data to a greater level of granularity in order to glean relevant insights.

Custom analytics is the coming together of breadth of knowledge about how social data mining works and deep knowledge of the industries that the enterprises are focused on.

Consider for example, a scenario for the health care industry. Flu is a frequent health problem, and it spreads like wild fire. Pharmacists and hospitals would do well to stock up on the required medicines, and more importantly, the vaccines to prevent virulent attacks. How can data help here? 
10

When I first heard the term, it resonated strongly with me. It was probably because Database Management, Relational Databases, Database schemas, DataWarehousing and Data mining had always been my field of interest, right from the days when I first started my career as Business Analyst in Singapore Airlines (SIA) several years ago. Even back in the day (late 90s, early 2000s), SIA used to get competitor fare data from MIDT, among other sources to try and optimise the potential fares that it could charge on the various sectors (and combinations of sectors) that it used to fly to. In Airlines, this practise of using historical data (both your own and competitors) to optimise future fares is known as Yield Management and/or Revenue Management. The practise has been officially in existence since the mid-80s when American Airlines then CEO - Robert Lloyd Crandall invented and named it. It later spread to other related sectors like hotels and hospitality. Revenue management in hotels is an equally sophisticated field today. Top organisation like IATA (International Air Transport Association) have started offering courses for "Airline Revenue Management" and Cornell University's School of Hotel Administration offers similar programs for hotels' revenue management professionals. It has spawned an entire sub-industry within the IT Products sector catering to Revenue management for Airlines like Sabre's Revenue Manager, Amadeus's Altea Revenue Management among others. Revenue Management for hotels gave rise to IT  product companies like IDeaS and EasyRMS. So with such sophisticated IT products and business analytics that have existed in these sectors since almost the advent of Internet, what is changing now, how is Big Data affecting it and what are the opportunities that Big Data is creating for Airlines and Hotels?

But before we get into those details, I wanted to establish the baseline/benchmark about what exactly is Big Data? So I scoured the Internet to find the multiple viewpoints that exist on its definition and try to reconcile all that into a single coherent and useful definition.  The definitions I encountered from reliable sources are verbatim as per below (intentionally leaving out Wikipedia as Wikipedia is an information aggregator and not a creator or original content):

Gartner:  Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Forrester: Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers

IDC: Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.

McKinsey Global Institute: "Big data" refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyse. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data--i.e., we don't define big data in terms of being larger than a certain number of terabytes (thousands of gigabytes). We assume that, as technology advances over time, the size of datasets that qualify as big data will also increase. Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).

Oreilly: Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.

Microsoft: The increasingly large and complex data that is now challenging traditional database systems

Oracle: Big data is the data characterized by 4 key attributes: volume, variety, velocity and value

IBM: Big data is the data characterized by 3 attributes: volume, variety and velocity

If I look at the myriad definitions above and try to create something that is relevant and meaningful to my cause, I would define Big Data as follows:

Big Data is high-volume, high-velocity and high-variety information assets  which require reasonable levels of veracity in turn creating substantial economic value  and helping in effective operations, revenue enhancement, decision-making, risk management and customer service.

Agree? Disagree? Suggestions for improvement? 

I look forward to hearing from you, as I further pen my thoughts on Big Data and its application in the ever so dynamic and exciting field of revenue management in Travel & Hospitality sectors,specifically ... Stay tuned...

Big Data: a retailer's currency for competitive advantage

I loved it when a Infosys Vice President , Sandeep Dadlani opened his statement at a panel discussion at TUCON 2012, Las Vegas with a 'Big Data is the B-word we don't want to speak about'. He could not have said it better. There has been much spotlight on Big Data and it is important to realize what it can do for a business especially retailers than stand in awe and reverence to all the statistics being thrown around in the name of Big Data.

Cutting the chase to where it matters - in a gloomy economy, retailers are struggling more than ever. Consumer spending seems to dip at a steady pace; Pressure on margins is high and competition is forcing prices down. And digital consumers are forcing retailers to re-think customer service and fulfillment experience. 

Taming the elephant: 10 Big Data trends for 2013

Devices. Processes. Customers. Today, these are sources of elephantine amounts of data that is hard to store, and harder to process. While enterprises were still trying to wrap their heads around the Big Data phenomenon in 2012, many of them will finally start taming it in 2013 with strategies and technology solutions. But what are the capabilities they desire? How will they leverage Big Data for greater business value? The trends in this infographic will give you some answers. If you think these trends will interest your peers or colleagues, share it with them via email or social media.

1