Extreme Automation in ETL testing
Author: Sudaresa Subramanian Gomathi Vallabhan, Group Project Manager
End-to-End Data Testing can be time consuming-given the various stages, technologies and huge volume of data involved. Each stage of ETL testing require different strategy/type of testing - one-to-one comparison, validation of migrated data, validation of transformation rules, reconciliation, data quality check and front end testing of BI reports.
· Effort spent in developing and maintaining the automation utility is high, given the vast technology landscape -of ETL.
· Delay in working with huge volume of data - while smaller utilities can work with limited sets of data, working with huge volume of data can be challenging.
· Difficulty in Integration with Test Management tool to provide end-to-end traceability.
Figure 1: Various stages in ETL and testing involved
Table 1: Data Testing and Automation need
Can Extreme Automation be achieved in Datawarehouse testing?
An integrated automation platform which combines all stages of Datawarehouse testing will be the perfect solution to achieve extreme automation. Following is a diagram illustrating few of the components of this automation framework:
Figure 2: Integrated Automation Platform for ETL Testing
Robust Data Handling: Handling data movement and associated validation is the backbone for achieving extreme automation. So, the platform should:
- Have its own database space for temporary execution so that tables can be built and collapsed quickly.
- Ability to handle huge volume of data.
- Focus on testing all aspects of data and its movement.
- Maintain traceability of data across stages.
- Provide options to user to select validation strategy.
End-to-end Automation: Since ETL testing traverses across multiple stages, an extreme automation solution should integrate Data testing and reports validation as follows:
- Data testing platform.
- Open source framework for reports validation and reconciliation of reports data with backend -such as Selenium and Eclipse IDE.
- Wrapper script that communicates between data work bench and reports.
Unattended execution: Ability to perform execution on an unattended basis based on data loaded to a specific environment. Unattended execution can save the overall execution effort by more than 40% if it is able to detect code drop/build and start automatically. This can be implemented using Jenkins which monitors for any code drop or build to trigger unattended execution.
Robust Test Reports: Test reports configured to be sent directly to user's mailbox after execution. Ability to automatically drill down to finer levels of details with respect to data defects or comparison results.
Achieving Extreme Automation in ETL testing is very critical for testers to free up their bandwidth and get upskilled on futuristic technologies, Big Data & Analytics testing. Thankfully, ETL is a great candidate for achieving end-to-end automation across stages with tangible business benefits and effort savings.
- As high as 50% effort saved in the individual stages of execution.
- High quality and reliability of migrated data.
- Fully automated data processing and anomaly reporting.