Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Best of Breed is Coming Back to Retail, but.... | Main | Let's Explore LETZ! »

A Maverick Approach to DWH Performance

Guest post by
Kaushal Kumar, Technology Lead, Infosys

 

Performance of any application is the most ignored ingredient of any application implementations be it ERP, Reporting, Transaction systems. However reports being the highest visibility item owing to fact that audiences are the top brasses in organization chart, is quickly diagnosed and subsequently punished. Hence a correctly built report is half built report until it runs fast.

We recently came across a such implementation where a revenue report (name suggests the criticality) was running in hours, though perfectly designed, still miles to go before the result appears. Evidently it was a big pain point from report authors to report consumers and we got to put thinking caps on!

Problem Statement & Approach

Though there are plenty of performance tuning solutions available, provided by different reporting tools (OBIEE, Cognos, MicroStrategy) like aggregations, event polling, Cache Seeding etc. There is one solution which continues to be out of spotlight yet one of the most cost effective and more importantly tool agnostic, DWH exchange partition. Though documented by Oracle already but getting to apply in right situation is the chance we got recently for one of our client requirements.

Revenue data was residing in different source systems but getting loaded into a unified warehouse and management expectation was to have a single revenue report on Monthly basis which essentially means querying a table containing 3 billion records leading to retrieval time of minutes in double digits. Certainly not the most desirable situation.

Exchange Partitioning comes as a handy tool to deal with fact tables which contains records in billions. So what we did was, partition the DWH table by interval partition and sub-partition it by list. Partition table is basically divided in multiple logical segments depending upon the user requirement but to user it will be a single table with billions of records.

In almost all of traditional data loading techniques table will not be committed until  jobs are processed completely, here lies the difference in Partition Exchange load from traditional data load techniques. In a typical Partition exchange "SWAP" will occur from an ordinary table to a specific partition of the table. Data will be transferred from a non-partitioned table to a partitioned one while exchanging the partition in Partition Exchange Load. We wrote  an automated script to add monthly partitions source system wise to logically segment the table for interval partition and list sub-partition. The swap occurs at the oracle data dictionary level rather than actual data level. As there is no physical data movement therefor the exchange is extremely fast.

There were two custom scripts were written to achieve this objective.

Solution

Creating the partition: We used the script embedded in a procedure to execute it while running the monthly load to create the partition for the month and source system for which data is loaded.

LOCK TABLE <partitioned_table>
      PARTITION FOR (<Partition_interval_Date>) In Shared mode

Exchanging Partition: This was an alter table command embedded in a procedure

ALTER TABLE <patition_table_name>
EXCHANGE PARTITION <partition_name>
WITH TABLE <unpartition_table>
INCLUDING INDEXES
WITHOUT VALIDATION

Usage Benefits

  • Create partitions automatically
  • Exchange Partitions for specified period and source system
  • Data Load is extremely fast
  • Data Retrieval is extremely fast

Business Benefits

As data dictionary update will occur data will be swapped extremely quickly. Business benefits it led to was improved application/report performance owing to:

  • Reduced data Load time with huge volumes: One of the DWH source systems(approx. 20 million records per month) which took 3 hours earlier, drastically reduced to 1 hour as load time
  • Reduced data retrieval time: Another paint point which vanished was a report which had retrieval time of 1+ hours now took less than 5 minutes
  • Reduced TCO: Faster load and report generation approximately led to 70000 USD pa.(extrapolated for year by 3 hours per day)
  • Optimum utilization of technology stack: Putting a standard script to the use rules out any customization in Informatica or OBIEE data model for better performance

References

Hope this helps the fellow DWH professionals. Happy reading till next installment.

Note: I would like to thank my colleague, Vinod Kumar, Senior Consultant at Infosys for his contribution to this blog.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles