Testing Services provides a platform for QA professionals to discuss and gain insights in to the business value delivered by testing, the best practices and processes that drive it and the emergence of new technologies that will shape the future of this profession.

« Are We Prepared to Manage Tomorrow's Test Data Challenges? | Main | Role of Open Source Testing Tools »

Extreme Automation in ETL testing

Author: Sudaresa Subramanian Gomathi Vallabhan, Group Project Manager

End-to-End Data Testing can be time consuming-given the various stages, technologies and huge volume of data involved. Each stage of ETL testing require different strategy/type of testing - one-to-one comparison, validation of migrated data, validation of transformation rules, reconciliation, data quality check and front end testing of BI reports. 

With the advent of Big Data across organizations, there is an increased need for automating the ETL testing as well as reports and business intelligence tools. There are various accelerators, excel macros and open source automation used by the testing teams to accelerate the testing at various stages. While they get the job done during individual phases of testing, it doesn't result in end-to-end automation because of the following reasons:
·  Integrating all the tool sets and executing the tests end-to-end is a challenge due to technical and infrastructure limitations.
·  Effort spent in developing and maintaining the automation utility is high, given the vast technology landscape -of ETL.
·  Delay in working with huge volume of data - while smaller utilities can work with limited sets of data, working with huge volume of data can be challenging.
·  Difficulty in Integration with Test Management tool to provide end-to-end traceability.


Figure 1: Various stages in ETL and testing involved


Challenges in Automation

Parameters for achieving Extreme Automation

Data Source

· Heterogeneous data sources-A typical ETL warehouse will have at least 5 different combinations of source systems which feed data.

·   Limitations of utilities such as excel based macros - occupies memory, needs maintenance and have data sample size limitation in comparing beyond a million rows.

· Automated comparison of huge volume of data

· Ability to compare data from heterogeneous systems - File to File, Database to File, Database to Database.

·  Ability to create a temporary table to store the intermediate execution results and delete after execution is completed.

ETL (Extract Transform Load)

· Determining validation strategy - Exhaustive or Sampling validation changes the automation framework. One framework does not fit all needs.

· Automation of transformation rules is complex - one form of automation may fail if the database changes (Teradata, SQL, Netezza, Oracle etc.).

·   Ability to validate movement of bulk data.

·   Automated validation of transformation rules.

·  Tool agnostic transformation rule builder - for deskilling the users.

· Automated determination of validation strategy- Exhaustive, Sampling etc., and suggestion to the user based on execution type.

Datawarehouse and DataMart

· Understanding Data Quality rules and automating them is complex. There can be as many as 300+ Data Quality rule sets in compliance Datawarehouse programs.

· Automated verification of referential integrity check has multiple layers and complex to implement.

· Validation of end-to-end business rules - requires tool to maintain traceability from requirement till downstream.

· Automated metadata and referential integrity verification.

· Data Quality Analysis engine - to validate correctness, completeness and appropriateness of the data.

· Ability of automation tool to work based on database schema and metadata to maintain referential integrity check.

· Ability to maintain end-to-end traceability - from requirements to reporting.

BI Reports verification

· Reconciliation of report data with backend is difficult to automate - due to complex data transformation involved.

· Automated report layout verification.

· Automated validation of report functionality and format.

· Automated reconciliation with backend data.

Table 1:
Data Testing and Automation need

Can Extreme Automation be achieved in Datawarehouse testing?

An integrated automation platform which combines all stages of Datawarehouse testing will be the perfect solution to achieve extreme automation. Following is a diagram illustrating few of the components of this automation framework:


Figure 2: Integrated Automation Platform for ETL Testing

Robust Data Handling: Handling data movement and associated validation is the backbone for achieving extreme automation. So, the platform should:

  • Have its own database space for temporary execution so that tables can be built and collapsed quickly.
  • Ability to handle huge volume of data.
  • Focus on testing all aspects of data and its movement.
  • Maintain traceability of data across stages.
  • Provide options to user to select validation strategy.

End-to-end Automation: Since ETL testing traverses across multiple stages, an extreme automation solution should integrate Data testing and reports validation as follows:

  • Data testing platform.
  • Open source framework for reports validation and reconciliation of reports data with backend -such as Selenium and Eclipse IDE.
  • Wrapper script that communicates between data work bench and reports.

Unattended execution: Ability to perform execution on an unattended basis based on data loaded to a specific environment. Unattended execution can save the overall execution effort by more than 40% if it is able to detect code drop/build and start automatically. This can be implemented using Jenkins which monitors for any code drop or build to trigger unattended execution.

Robust Test Reports: Test reports configured to be sent directly to user's mailbox after execution. Ability to automatically drill down to finer levels of details with respect to data defects or comparison results.


Achieving Extreme Automation in ETL testing is very critical for testers to free up their bandwidth and get upskilled on futuristic technologies, Big Data & Analytics testing. Thankfully, ETL is a great candidate for achieving end-to-end automation across stages with tangible business benefits and effort savings.

  • As high as 50% effort saved in the individual stages of execution.
  • High quality and reliability of migrated data.
  • Fully automated data processing and anomaly reporting.


Thanks for giving Good Example. Very well written.

Thanks, it was nice to read your post. These will be very much helpful for the people to get them to success.

Thanks for very nice topic. The above provided lessons are very useful to explore Abinitio.

Thanks for providing information. It was very nice topic and it is very useful to Testing tools learners.

Thaks for posting this useful information,
It is very useful to me and who are searching for the ETL Testing online training.

Thanks for sharing this useful information,

It is very useful to me and who are searching for ETL Testing online training.

Thank you for sharing this useful information.
It is very useful to me and who are searching for ETL Testing online training .

Thank you for sharing nice information.
It is very useful to me.

Thank you for providing this kind of useful information,
I am searching for this kind of useful information; it is very useful to me and some other looking for it.

Thanks for giving Information.

Thank you so much for sharing this useful information. It’s a very nice topic .

thanks for sharing.....great information etl testing

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Infosys on Twitter