Accelerating Business Intelligence through Performance Testing of EDW Systems
Enterprise Data-warehouse (EDW) systems help businesses take better decisions by collating massive operational data from disparate systems and converting them into a format that is current, actionable and easy to comprehend. Gartner has identified performance as one of the key differentiators for data warehouses. With data warehouses growing in size, meeting the performance and scalability requirements has been an incessant challenge for enterprises. EDW performance testing will uncover the bottlenecks and scalability issues before go-live thereby reducing performance related risks and scalability concerns. EDW performance testing covers the following key aspects:
- Performance of jobs to Extract, Transform and Load (ETL) data into EDW
- Performance during report generation and analytics
- Scalability of Data-Warehouse
- Stability and resource utilization
However, there are a few challenges with EDW Performance Testing, namely
- Complexity due to the huge volume of data
- Load simulation challenges due to self-service and ad-hoc reporting and analytics
- Interdependency on interfaces and external systems
To address these challenges, one must start with a clear strategy of the performance testing i.e. defining objectives such as benchmarking, troubleshooting, scalability and capacity assessment of EDW. Once the objectives are set the next step is to model the EDW workload. The workload can be a mix of six types of transactions - Batch Load, Reporting, Online Analytical Processing, Real-time Load, Data Mining and Operational BI. After the workload is determined EDW performance characterization is done by simulating the workload and analyzing the results.
For ETL, the performance at each stage of the ETL process has to be measured and verified against the SLAs. Slowest part of an ETL process usually is the database load phase. Metrics such as total time for load, number of records updated per hour, number of feeds consumed per run and server resource utilization have to be collected and analyzed for performance. An incremental approach should be adopted in terms of volume, load and coverage while strategizing the performance testing.
Load, Scalability and Endurance tests are important for performance validation of EDW. Load testing is to measure and validate the performance of business critical reports and dashboards. Scalability test is for validating whether the underlying hardware has sufficient capacity to handle the growth. Also, it detects any bottleneck when EDW is subjected to the large volume of data and multiple users. Whereas Endurance tests help us detect stability, memory leaks, and resource consumption issues.
Superior EDW performance for decision support is dependent on the response time of data fetch queries and analytics, Performance testing validates whether the hardware and software are tuned optimally for the expected response time. Also, it confirms whether EDW can withstand the anticipated future load thereby ensuring the business continuity.
EDW aggregates data from various operational sub-systems such as sales, marketing, customer support etc., impacting the EDW performance. Testing uncovers any performance issues caused by these sub-systems.Vendors, who supply the software and hardware for building the EDW, claim several performance characteristics of their products. Since the IT infrastructure varies from enterprise to enterprise, there can be inconsistencies. Testing helps in benchmarking EDW performance and taking corrective actions before go- live.