Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« OAC-Essbase Data Load & Dimension Build Using CLI | Main | Mitigating Low User Adoption in Sales Automation »

Oracle Data Visualization (DVD/DVCS) Implementation for Advanced Analytics and Machine Learning

Oracle Data Visualization Desktop(DVD) or Cloud Server(DVCS) is a very intuitive tool, which helps every business user in the organization to create quick and effective analytics very easily. People at all level can leverage the benefit of blending and analysing data in just a few clicks and help the organization to take informed decision using actionable insights. Oracle DVD is a Tableau like interactive tool which helps to create analysis on-the-fly using any type data from any platform, be it on premise or Cloud. Main benefits of Oracle DVDs are below:

·         A personal single user desktop tool, or a SAAS cloud service, which can be leveraged by any business user in the Organization.

·         Enable the desktop user to work even offline

·         Completely private analysis of heterogeneous data

·         Business user can have entire control over the dataset/connections

·         Direct access to on premise or cloud data sources

·         Administration task has been removed completely

·         No concept of remote server infrastructure

Oracle DVD/DVCS enables the business user to perform analysis using traditional methodologies as well as provides capability to perform Advance Analytics using R and creating Predictive model using Machine Learning algorithm using Python.

This simple and intuitive tool provides a very unique way to enable you to perform Advance analytics by just installing all the required packages. DVML (Data Visualization Machine Learning library) is the tool to help you install all the required packages for implementing machine learning algorithm for predictive analysis in one go.

Install Advance Analytics(R) utility will help you to install all the required R packages to perform Advanced Analytics functions like Regression, Clustering, Trend line etc. However, to run both the utility in your personal system/server, you need administrative access as well as access to internet and permission to automatically download all the required packages.


In the below slides we are going to discuss, how to leverage Advance analytics and machine learning functions to provide predictive analytics for the organization.

In order to create a Trend line graph, we need to enable Advanced Analytics and then pull required column into the Analysis.

Trend line Function: This function takes 3 parameters to visualize the data in a trending format.

Syntax: TRENDLINE(numeric_expr, ([series]) BY ([partitionBy]), model_type, result_type)

Example : TRENDLINE(revenue, (calendar_year, calendar_quarter, calendar_month) BY (product), 'LINEAR', 'VALUE')

We need to create various canvases and put them into one story line by providing corresponding description over the canvas. While creating Trend line visualization, we need to provide the Confidence level of data. By default, it will take 95% confidence level, which means the analysis will be performed over the 95% of data.


View image

 

Fig 1: Profit Trend line (Method: Linear)

 

View image

Fig 2: Canvases for Profit Trend line

 

View image

 

Fig 3: Story Line to Visualize Profit Trend line

 

Outlier: Outlier functions distinguished particular values (Min /Max) from the rest of data set. This will help business user to understand those values which are way out of the specific pattern of data. This outlier can cause a serious problem while performing predictive analytics. So by removing the outliers (after detecting) from the data set, we can have better actionable insights from data.

Syntax: OUTLIER((dimension_expr1 , ... dimension_exprN), (expr1, ... exprN), output_column_name, options, [runtime_binded_options])

Example: OUTLIER((product, company), (billed_quantity, revenue), 'isOutlier', 'algorithm=mvoutlier')

Outlier function takes two parameters from the dataset, one: Dimension value, on which we perform outlier, two: Measure value to find out the outlier data. We also need to specify which algorithm we are going to use to perform outlier analysis. Below visualization shows the outlier city name based on the profit they generates.

View image

Fig 4: Outlier Graph for Various cities

 

View image

Fig 5: Profit by City Outliers in a Story line

 

Cluster: Cluster functions takes one or more input (measures value) to put particular dimensions into specific groups by applying different clustering algorithm such as k-means or hierarchical clustering.  Cluster functions takes various parameters as input where we mention the dimension's value as well as measure value to put the dimensions into multiple clusters. We also need to mention which clustering algorithm we are going to use (e.g. k-means) and the number of clusters.

Syntax: CLUSTER((dimension_expr1 , ... dimension_exprN), (expr1, ... exprN), output_column_name, options, [runtime_binded_options])

Example : CLUSTER((product, company), (billed_quantity, revenue), 'clusterName', 'algorithm=k-means;numClusters=%1;maxIter=%2;useRandomSeed=FALSE;enablePartitioning=TRUE', 5, 10)



View image

Fig 6: Cluster Visualization for Product category by Sales

 

Predictive Analytics using Machine Learning: Before starting to implement Machine learning algorithms, we need to install all the required packages by executing DVML utility in case you are running into DVD mode. In case DVCS (available with OAC) all the required libraries are already installed.

In-order to preform predictive analytics, we need to start with Data Flow tab in the DV window. In data flow, we mentioned all the required steps, starting from picking up the sample data set, selecting the required columns for the to pass as parameter/in dependent variables and selecting machine learning model (e.g. Train Numeric Prediction). Once everything is selected, we need to save the model by providing appropriate name and then execute the Data flow by clicking on 'Execute Data Flow' button, on top right.

We need to provide various parameters while sleeting the model, e.g. Model type (Linear), column (Sales) , Train/Test data set (70% - Train, 30% Test) etc. to optimize the model for better result.

 

View image

Fig 7: Machine Learning Data Flow on Sales Data

 

View image

Fig 8: Various Machine learning algorithm 

 

Once Data flow is executed we can see them under the Machine learning tab.

 

View image

Fig 9: Machine learning models 

 

To explore, how the model performed we need to right click on top the model name and click on inspect option. This will provide the below screen, which will allow us to inspect various properties or values returned by the machine learning model after getting trained on certain amount of data and also tested.

View image

Fig 10: Machine learning models various parameters/output

 

Comments

Good post Dhiman!!! Keeping blogging.

Dhiman - Very informative and detailed blog. Its is especially a very good read for end users who want to quickly enable themselves with the advanced analytics features that OAC has to offer.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles