Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« BLOCK CHAIN, WAY MORE THAN BITCOIN | Main | Blockchain-based Logistics of the future... »

Comparative Study Between Oracle Big Data Cloud Service and Compute Engine


Comparative study between Oracle BDCS and Oracle Big Data Cloud Compute Engine.


1.             Oracle Big Data Cloud Service: Gives us access to the resources of a preinstalled Oracle Big Data environment, this also comes with an entire installation of the Cloudera Distribution Including open source Apache Hadoop and Apache Spark. This can be used to analyze data generated from Social Media Feeds, E-mail, Smart Meters etc.

OBD CS contains:

·         3-60 Nodes cluster, 3 is the minimum number of cluster node(OCPU) available to start with; where we can increase the processing power and secondary memory of the cluster node can be extended by adding Cluster computer nodes("bursting").

·         Linux Operating System Provided by Oracle

·         Cloudera Distribution with Apache Hadoop (CDH):

-          File System: HDFS to store different types of files

-          MapReduce Engine (YARN is default for resource management)

-          Administrative Framework, cloud era manager is default

-          Apache Projects e.g. Zookeeper, Oozie, Pig, Hive, Ambari

-          Cloudera Application, Cloudera Enterprise Edition Data hub, Impala Search and Navigator


·         Built-in Utilities for managing data and resource

·         Big Data Spatial and Graph for Oracle

·         Big Data Connectors for Oracle:

-          Oracle SQL Connector for HDFS

-          Oracle Loader for Hadoop environment

-          Oracle XQuery for Big Data

-          ORE Advanced Analytics for Big Data

-          ODI Enterprise Edition


Typical Workflow of OBDCS: Purchase a subscription -> Create and manages users and their roles -> Create a service instance -> Create an SSH key pair -> Create a cluster -> Control network access to services -> Access and work with your cluster -> Add permanent nodes to a cluster -> Add temporary compute nodes to a cluster (bursting) -> Patch a cluster -> Manage storage providers and copy data

odiff (Oracle Distributed Diff) is a Oracle developed innovative tool to compare huge data sets stores sparsely using a Spark application and compatible with CDH 5.7.x. Maximum file/directory size limit is 2 G.B. to compare.



O          Oracle Big Data Cloud Compute Engine: Oracle Big Data Cloud combines open source technologies such as including Apache          Spark and Apache Hadoop various tools and technologies developed by other vendor like Horton works with distinguished innovations from Oracle to provide an entire Big Data Platform for executing and handling Big Data Analytics applications. It leverages Oracle's Infrastructure Cloud Services for a holistic solution with proper security, reliability and elasticity. It consists of:


·         It provides ability to Spin up multiple Hadoop or Spark clusters in minutes

·         Use built-in tools such as Apache Zeppelin to understand & process data

·         Use various open interfaces to integrate third-party tools to analyze data

·         Provides ability to Launch multiple clusters at the same time against a centralized data lake to achieve data sharing without compromising on job isolation

·         Ability to create very small clusters or huge ones based on workload and business requirements

·         Elastically scale the compute and storage tiers independently of one another, either manually or in an automated fashion

·         Power to pause a cluster when not in use

·         Use REST APIs to monitor, manage, and utilize the service


Typical Workflow of OBDCCE: Sign up for a free credit promotion or purchase a subscription -> Add and manage users and roles -> Create an SSH key pair -> Create a cluster -> Enable network access -> Load data -> Create and manage jobs -> Create and manage notes -> Monitor clusters -> Monitor the service


Big Data Cloud Can be accessed in many ways; Using CLI, Rest API, SSH(Putty) , Console etc.


Differences Between OBDCS and OBDCCE :



Preconfigured PaaS Service with CDH and Apache Spark

Includes both Apache Spark and Hadoop with Unique innovations from Oracle

Defined number of cluster with minimum 3 and maximum 60

No defined number of cluster, can be added as and when needed

Use Cloudera manager and YARN as default option for administrative service and resource manager

No default administrative service is there,

Uses CDH impala and various other cloud era application for data understanding

Uses apache Zeppelin for to understand data better

Default Oracle Big Data Connectors are provided for Data loading and unloading, and Oracle R integration

No default Oracle Big Data connectors are provided, can be added/installed based upon requirement

Odiff utility is used to find difference between large datasets/file/directory

Zeppeline notebook is used to write code to explore and visualize data

Odcp utility is used to load large dataset in a distributed environment like HDFS, Amazon S3, Oracle Cloud Storage etc.

Data can be loaded to Cloud Storage & HDFS by simply browsing the file in local system using URL

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles