Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Hyperion Data Integration automation in a hybrid environment | Main | Leveraging Blockchain to Deal with a Pandemic - Point of View »

Oracle Data Lake

Oracle Data Lake

Evolution of data

In today's world, the quantity of data produced in a day is exponentially growing which is about 2.5 quintillion bytes of data being generated. The reason for the epidemic creation of data includes several platforms like Internet (information at our fingertips, web searches), Social media (fuels data creation with Facebook, Instagram, twitter, snapchat), Communication (from sending texts to email, GIFs, emoji's, skype calls), Digital photos (YouTube, voice search), Services like weather channel, Uber rides, transactions. As the data keeps growing, data handling comes to stake for most of the enterprise. The importance of these data rely on how the data is stored and how to extract value out of it effectively. The traditional method of storing data, such as relational database and data warehouses have their own limitations of storage capacity, type of data stored (Unstructured /semi- structured), storage cost, non-scalable.

Why Data Lake?

In order to overcome the limitations of traditional storage methods, Data Lake is provided by many service providers like amazon, snowflake, Microsoft, Oracle etc. for large storage with structured data, semi - structured data, unstructured data and binary data. It is a single root of all enterprise data including raw data from source system and transformed data used for activities such as visualization, reporting, prediction, advanced analytics and machine learning.


Here is how data lake differs from traditional data warehouse.

Data Lake

Data Warehouse

No Structured Data model and Retains all the data irrespective of any models

Highly structured data model which have specific data which answers the necessary questions

Data Lake stores all the data types including web server logs and sensor data

Data warehouse does  not supports datatypes such as web server logs, social network activity, sensor data

Data Lake stores Raw Data and Data is available all time, to go back in time and do an analysis.

Data warehouse stores Processed Data and significant time is spent on analyzing various data sources

Schema is defined after data is stored, efforts at the end of the process

Schema is defined before data is stored, efforts at the start of the process

Can store unlimited data forever

Expensive to store large amount of data

Adaptive, Highly accessible and quick to update

More complicated and costly to make changes

Used by data scientist for predictive analysis and machine learning, in-depth analysis

Used by business professionals for structured view of data and operational view of data

Uses ELT (Extract, Load, Transform) process, it empowers users to access data prior to the process of transformed and structured.

Uses ETL (Extract,Tranform,Load) process, it provide insights into pre-defined questions


Data Lake- Oracle Cloud Architecture:


Data Lake mainly constitutes of:

  • Sources
  • Landing zone
  • Standardization zone and
  • Analytics Sandbox


Key Components:

Oracle Data Integration Platform Cloud(ODI)

Oracle Data Integration Platform Cloud is affiliated platform for real-time data replication, data quality, data transformation, data governance, cleanse, integrate and analyze data. ODI encompass:

  • Migrate data without any down time
  • Integrate Big Data
  • Data health monitoring
  • Automate Data Mart generation
  • Profile and validate data
  • Synchronizing data
  • Support redundancy

Oracle Autonomous Data warehouse:

Oracle Autonomous Data Warehouse provides a fully autonomous database that does not require data administration for scalability and provides fast query performance. Deployment features includes either dedicated private cloud in public cloud service or a shared simple elastic choice. Database is capable of self-patching, self-tuning and upgrading by itself. The key features are:

  • Elasticity
  • Autonomous
  • Database migration utility
  • Cloud-based data loading
  • Enterprise grade security
  • Concurrent workloads
  • High performance

·        Oracle Stream Analytics

Oracle Stream Analytics is a tool for real-time analytic computing on streaming big data. OSA executes in a scalable and highly available clustered Big Data environment. It significantly enables users to explore real-time data like sensor data, social media, Banking etc. through live charts, maps, visualizations. Oracle Stream Analytics includes 30+ visualization charts, which are user friendly with respect to interface, based on Apache Superset. It is developed and made available to all the users without the need of any technical background.

Key features:

  • Location-based analytics using built-in spatial patterns
  • Machine learning to predict upcoming events
  • Ad hoc queries on processed data
  • Detecting real time fraud

·        Oracle Cloud Infrastructure

Oracle Cloud Infrastructure is a cloud service, which enables you to build and run a broad space of applications in a highly available environment with control improvements related to on premise data centers, subject to cost savings and the elasticity of the public cloud. Oracle provides technologies that entrust enterprises to solve critical business problems. Oracle Cloud Infrastructure is cloud purpose-built to allow enterprises to run business-critical production workloads. Key Features includes:

  • High availability - deployment against multiple regions, availability domains (AD) and faulty domain configuration
  • Scalability - ability to scale resources automatically up and down w.r.t changing business needs so you pay for only what you use
  • Performance - High performance computing instances (HPC)
  • Price - low and enhanced price performance compared to other cloud services

·        Oracle Identity Cloud Service OICS

Oracle Identity Cloud Service provides single-sign-on SSO, identity management and identity governance for the applications, which is in the mobile, cloud and on premise application. It is fully integrated service delivering the core identity and access management activity with a multi-tenant cloud platform. Anyone can use the application any time anywhere on a device in secure manner. Oracle Identity Cloud Service will directly integrate with the existing directories and identity management which in turn easier for the users to access the applications. The benefits includes

  • Better user productivity and experience
  • Reduced cost
  • Improved business responsiveness
  • Hybrid identity

OAC - BI Reporting & Visualization:

BI helps in decision-making driven by data. BI encompasses the generation of data and analysis, eventually visualization of data so that business analysts and business leaders make the most needed decisions about products, strategies, market timing, and other mission-critical factors.

  • Oracle Analytics Cloud allows you to take data from any source, and explore and collaborate with real-time data
  • OAC helps you ask any question from your data with mobile-friendly features in OAC
  • OAC includes Self-service Visualization, Data preparation, Advanced Analytics, Enterprise Reporting
  • OAC is cloud-based analytics solution within the Oracle Analytics or Business Intelligence space 

Advantages of Data Lake

  • Data Lake stores data in original form and the advanced analytics depends on the actual raw data, used by data scientists and analyst to experiment with data and advanced analytical support
  • A data lake handles structured, semi structured or unstructured data such as streaming data, logs, equipment readings, telemetry data and able to derive value regardless of data type
  • For high-speed data streaming in huge volumes, Data Lake makes use of tools such as Kafka, Flume, Scribe, and Chukwa to acquire high-velocity data, which is in the form of Tweets, WhatsApp messages, Instagram or it could be sensor data from the machine
  • Offers cost-effective scalability and flexibility, we can store all types of data inexpensively hang on to it for some future analysis for getting value out of it anytime needed
  • Collects and stores huge data sets, visualize telemetry and customer data, detect anomalies and ensure security
  • Data Lake can be the data source for a front-end application providing application support
  • In Data Lake, we can define the structure of data or schema, transformations at the time of its use, which is called schema on reading and also it allows schema free unlike traditional data warehouse
  • Data Lake supports more languages other than SQL such as to analyze the data flow, PIG can be used and Spark MLIB for machine learning. Tools like Hive allow us to run multiple parallel sql queries thereby reducing the query access time   

Industrial Applications of Data Lake

Oil and gas industry analytical requirements include minimized unplanned downtime, optimized directional drilling, lowered lease operating   expenses, improved safety and adherence to regulatory matters as collection of data for prediction analysis

Smart city initiatives includes tracking the vehicle pattern, speed, waterways, tolls, highways, bridges, usage timings which can be used to   manage traffic signals, control traffic, prevent congestion 

Life science study, which includes storing data on heart rate, blood pressure, white blood and red blood cell counts, temperature, height,   weight, enzymes and analysis, may help in predicting the increase in human life expectancy by data analyst

Marketing and Customer data platform creates a database for every customer that incurs data from multiple sources like mobile and web   preferences, profile data, browsing history, behavioral and transaction data, brick and mortar system, loyalty program which leads to   personalized marketing program

Banking industry stores customer account data, credit and debit card transactional data, wireless payment data, general ledger data   including   purchase information, trading data for many years now, improving the data agility.



Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles