A Maverick Approach to DWH Performance
Guest post by
Kaushal Kumar, Technology Lead, Infosys
Performance of any application is the most ignored ingredient of any application implementations be it ERP, Reporting, Transaction systems. However reports being the highest visibility item owing to fact that audiences are the top brasses in organization chart, is quickly diagnosed and subsequently punished. Hence a correctly built report is half built report until it runs fast.
We recently came across a such implementation where a revenue report (name suggests the criticality) was running in hours, though perfectly designed, still miles to go before the result appears. Evidently it was a big pain point from report authors to report consumers and we got to put thinking caps on!
Problem Statement & Approach
Though there are plenty of performance tuning solutions available, provided by different reporting tools (OBIEE, Cognos, MicroStrategy) like aggregations, event polling, Cache Seeding etc. There is one solution which continues to be out of spotlight yet one of the most cost effective and more importantly tool agnostic, DWH exchange partition. Though documented by Oracle already but getting to apply in right situation is the chance we got recently for one of our client requirements.
Revenue data was residing in different source systems but getting loaded into a unified warehouse and management expectation was to have a single revenue report on Monthly basis which essentially means querying a table containing 3 billion records leading to retrieval time of minutes in double digits. Certainly not the most desirable situation.
Exchange Partitioning comes as a handy tool to deal with fact tables which contains records in billions. So what we did was, partition the DWH table by interval partition and sub-partition it by list. Partition table is basically divided in multiple logical segments depending upon the user requirement but to user it will be a single table with billions of records.
In almost all of traditional data loading techniques table will not be committed until jobs are processed completely, here lies the difference in Partition Exchange load from traditional data load techniques. In a typical Partition exchange "SWAP" will occur from an ordinary table to a specific partition of the table. Data will be transferred from a non-partitioned table to a partitioned one while exchanging the partition in Partition Exchange Load. We wrote an automated script to add monthly partitions source system wise to logically segment the table for interval partition and list sub-partition. The swap occurs at the oracle data dictionary level rather than actual data level. As there is no physical data movement therefor the exchange is extremely fast.
There were two custom scripts were written to achieve this objective.
Creating the partition: We used the script embedded in a procedure to execute it while running the monthly load to create the partition for the month and source system for which data is loaded.
LOCK TABLE <partitioned_table>
PARTITION FOR (<Partition_interval_Date>) In Shared mode
Exchanging Partition: This was an alter table command embedded in a procedure
ALTER TABLE <patition_table_name>
EXCHANGE PARTITION <partition_name>
WITH TABLE <unpartition_table>
- Create partitions automatically
- Exchange Partitions for specified period and source system
- Data Load is extremely fast
- Data Retrieval is extremely fast
As data dictionary update will occur data will be swapped extremely quickly. Business benefits it led to was improved application/report performance owing to:
- Reduced data Load time with huge volumes: One of the DWH source systems(approx. 20 million records per month) which took 3 hours earlier, drastically reduced to 1 hour as load time
- Reduced data retrieval time: Another paint point which vanished was a report which had retrieval time of 1+ hours now took less than 5 minutes
- Reduced TCO: Faster load and report generation approximately led to 70000 USD pa.(extrapolated for year by 3 hours per day)
- Optimum utilization of technology stack: Putting a standard script to the use rules out any customization in Informatica or OBIEE data model for better performance
Hope this helps the fellow DWH professionals. Happy reading till next installment.
Note: I would like to thank my colleague, Vinod Kumar, Senior Consultant at Infosys for his contribution to this blog.