Testing Services provides a platform for QA professionals to discuss and gain insights in to the business value delivered by testing, the best practices and processes that drive it and the emergence of new technologies that will shape the future of this profession.

« Extreme Automation - The Need for Today and Tomorrow | Main | Extreme Automation in ETL testing »

Are We Prepared to Manage Tomorrow's Test Data Challenges?

Author: Sunil Dattatray Shidore, Senior Project Manager

As tomorrow's enterprises are embracing latest technology trends including SMAC (Social, Mobile, Analytics, Cloud) and adopting continuous integration and agility, it is imperative to think of more advanced, scalable and innovative ways to manage test data in non-production environments. It can be for development, testing, training, POC, pre-prod purposes. The question really is - have we envisioned the upcoming challenges and complexity in managing test data and are we prepared and empowered with right strategies, methodologies, tools, processes and skilled people in this area?

Following are some of the critical areas that we need to focus:

  • Test data for Big Data Testing and IOT
  • Data Management for cloud based NPE (Non Production Environments)
  • Data on demand and extreme self service
  • Agility in data provisioning
  • Synthetic data creation through virtual services
  • Data virtualization

Big Data and IOT: The key challenges in Big Data Testing and IOT (Internet of Things) are:

  • Complexity of building scalable test environment
  • Realistic/meaningful sample data subsets to test & simulate real life scenarios 

It's practically not feasible for any enterprise to build their own test environment lane which would be capable to test Big Data transformations and support diverse and complex data sources like social media, sensors, and peripheral devices. Getting 'real world' data has its own challenges in terms of data privacy and security of PII (Personally Identifiable Information) and confidential data. 

In order to address these TDM challenges, one should attempt to build the reusable component libraries of 'real-like' data. These data repositories would have seed lists for various data entities across domains which are commonly required for testing. Component libraries would cover both structured and unstructured data sets which can be used in Big Data testing. Synthetic data creation for unstructured data and tools supporting this capability would play important role.

Cloud-based Environments: Data related risks in cloud environment are more serious compared to enterprise environments. Some of the critical risks are -

  • Vulnerability and business risk of exposure of sensitive data,
  • Leak of intellectual property to competitors,
  • Loss of data due to external threats
  • Huge Penalties/Loss of reputation for non-compliance to regulatory standards, Data Tampering, Data theft/data loss impacts brand image, customer confidence etc. 

Following are the key considerations for Cloud based environments to provision data -

  • Requirements and Controls - Regulatory Requirements, Data security controls on Cloud
  • Data Transformation - Data Migration (data in motion) and data storage (data at rest), Data Refresh (Periodic), Data Masking approach, Synthetic data creation
  • Tooling, Process, Infra - Process, Policies and standards, Tooling and Licenses, Infrastructure investment

Self-service and Enablement: As QA functions attained the maturity in the test data management journey, the paradigm shifted from tactical test data management done by silo testing teams to centralized shared service / function is inevitable. This enables self-service data provisioning and further improve testing efficiency. This will enable QA teams to become self-sufficient and provision the necessary data using the service utilities. Test data banks, self-service portals, data on demand features, reusable data mining templates, reusable data creation algorithms play critical role here. Extreme self-service can be implemented through various techniques including Persona based configurable Service Catalog - Pick and Choose, Team Collaboration, Self Service/Context aware Test Data Management etc.

Continuous Integration and Agility: It has been observed from studies and practical experience from QA teams that Agile projects need 80% of test data to be fabricated / augmented / tweaked to test sprint stories compared to 20% in waterfall / BAU releases projects. Key challenge in agile projects is minimal wait time for test data availability. Sprint teams cannot wait for TDM service team to deliver data as per their SLA. Organizations are looking for agility in data provisioning. We have to leverage TDoD (Test Data on Demand) capabilities of TDM tools. Following are the examples of effective tools and harnesses that enhance agility of data provisioning: Reusable gold copies of data sets, In house automation tools for data creation/ augmentation, Self-service utilities, File export/import tools.

Synthetic Data: One of the key challenges in synthetic data creation even using industry TDM tools is to configure complex business logic and transformation rules on specific data attributes. Understanding data models, data integrity and replication of complex rules sometimes becomes one of the toughest things to resolve. Here we can leverage virtual services which create test data as per business logic. TDM teams are trying to adopt hybrid approach of using TDM tools for data fabrication of customer demographics data and virtual services for synthetic generation of customer keys / identifiers that requires complex business logic to create real data. Also reusable component libraries of domain specific data entities and seed lists for common data entities would be very helpful tools to leverage.

Database Virtualization: Physical copies or subsets of production data consume huge space when loaded to test environments and multiple test environments need this test data sets to cater to various types of testing like system testing / integration testing/ performance testing / user acceptable testing / sociability testing etc. Instead of having physical copies of data in each test environment, database virtualization technique will enables creation of multiple virtual copies of databases / schemas from one base physical copy. For example 2 TB of DB schema in UAT environment can be leveraged to create multiple virtual copies which can be used in ST/SIT/PT environment lanes which otherwise could have consumed multi-fold DB space. 


IT development and Testing is fundamentally getting reshaped and transformed with adoption of continuous integration (DevOps) and Agile methodology. It has become an absolute need of an hour to assess our preparedness in managing test data for tomorrow's enterprises which are adopting advanced technologies. Big Data and IOT already hitting the ground ...Cloud based environments being adopted to leverage LAAS and PaaS services... Data privacy and regulatory norms mandated and increased demand of synthetic and virtual data .... Optimization of Opex in managing environment and data .....

We must be prepared with right set of strategies, tools and practices so as to enable tomorrow's enterprises manage test data for the projects operating in continuous integration and agile environment.


Very informative article Sunil. Gives insight to streamline idea of test Data need and challenges.

Superb... gave us an brief idea about the upcoming technologies. And will be useful for those who are trying to study an upcoming technology.

Certain terms used in article like "TDoD" and "Test Data on Demand" are trade marks of other tool vendors like Grid Tools. Please validate and include citation. Be respectful and give credit to respective IP of others.

Very good article, very informative and nicely written.

Brief overview, nicely written. However I was looking forward for little more details into Continuous test data provisioning.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Infosys on Twitter