What actually is Big Data? - The different definitions
When I first heard the term, it resonated strongly with me. It was probably because Database Management, Relational Databases, Database schemas, DataWarehousing and Data mining had always been my field of interest, right from the days when I first started my career as Business Analyst in Singapore Airlines (SIA) several years ago. Even back in the day (late 90s, early 2000s), SIA used to get competitor fare data from MIDT, among other sources to try and optimise the potential fares that it could charge on the various sectors (and combinations of sectors) that it used to fly to. In Airlines, this practise of using historical data (both your own and competitors) to optimise future fares is known as Yield Management and/or Revenue Management. The practise has been officially in existence since the mid-80s when American Airlines then CEO - Robert Lloyd Crandall invented and named it. It later spread to other related sectors like hotels and hospitality. Revenue management in hotels is an equally sophisticated field today. Top organisation like IATA (International Air Transport Association) have started offering courses for "Airline Revenue Management" and Cornell University's School of Hotel Administration offers similar programs for hotels' revenue management professionals. It has spawned an entire sub-industry within the IT Products sector catering to Revenue management for Airlines like Sabre's Revenue Manager, Amadeus's Altea Revenue Management among others. Revenue Management for hotels gave rise to IT product companies like IDeaS and EasyRMS. So with such sophisticated IT products and business analytics that have existed in these sectors since almost the advent of Internet, what is changing now, how is Big Data affecting it and what are the opportunities that Big Data is creating for Airlines and Hotels?
But before we get into those details, I wanted to establish the baseline/benchmark about what exactly is Big Data? So I scoured the Internet to find the multiple viewpoints that exist on its definition and try to reconcile all that into a single coherent and useful definition. The definitions I encountered from reliable sources are verbatim as per below (intentionally leaving out Wikipedia as Wikipedia is an information aggregator and not a creator or original content):
Gartner: Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
Forrester: Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers
IDC: Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.
McKinsey Global Institute: "Big data" refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyse. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data--i.e., we don't define big data in terms of being larger than a certain number of terabytes (thousands of gigabytes). We assume that, as technology advances over time, the size of datasets that qualify as big data will also increase. Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).
Oreilly: Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.
Microsoft: The increasingly large and complex data that is now challenging traditional database systems
Oracle: Big data is the data characterized by 4 key attributes: volume, variety, velocity and value
IBM: Big data is the data characterized by 3 attributes: volume, variety and velocity
If I look at the myriad definitions above and try to create something that is relevant and meaningful to my cause, I would define Big Data as follows:
Big Data is high-volume, high-velocity and high-variety information assets which require reasonable levels of veracity in turn creating substantial economic value and helping in effective operations, revenue enhancement, decision-making, risk management and customer service.
Agree? Disagree? Suggestions for improvement?
I look forward to hearing from you, as I further pen my thoughts on Big Data and its application in the ever so dynamic and exciting field of revenue management in Travel & Hospitality sectors,specifically ... Stay tuned...