Infosys experts share their views on how digital is significantly impacting enterprises and consumers by redefining experiences, simplifying processes and pushing collaborative innovation to new levels

« Understanding the Art and Science of Data Lineage - Key ingredient of Privacy by Design | Main | Is Homomorphic Encryption a game changer in the Data Privacy Space? »

Contextual Data Generation for Secure Quality Assurance

Why do you need Contextual Data Generation in your testing process?


In today's world, quality assurance is an integral part of the IT delivery process which that ensures that the final product is ready to be shipped to the customer. Testing in production-like test environments is an essential part of quality assurance. 

While production data is the best data to test the application, many organizations are not allowed to use production data for testing purposes due to privacy concerns and key global regulations such as GDPR and CCPA. The alternatives are to use anonymized data or synthetically generated data.

In Today's post-pandemic world, the key for a successful testing exercise using contextual test data which enables the organization to simulate production-like use cases devoid of PII (Personally Identifiable Information) / SI (Sensitive Information) to ensure there are no data privacy or regulation breaches.  


Contextual Test Data as a Pivot of Data Privacy in Application Development and Testing


The process of generating the test data can be achieved through one of the following methods: 

  1. Test data can be manually generated
  2. Mass copy of data from production to testing environment,
  3. Mass copy of test data from legacy client systems and
  4. Automated Test Data Generation using tools.


The synthetic data generation falls under the last method where we can leverage the power of recent technologies such as Machine Learning to train models to identify the different kinds of fields present. This can be done by reading through the schema details of the requirement. Once categorized, we can identify a set of algorithms designed specifically for data generation purposes of that specific category and generate production-like data for that field. A similar procedure can be followed for all the fields, tables in the schema for the generation of mock data required in testing. 


We can also train the model to follow references and use data generated already at the parent field to regenerate data at some referential field to avoid any errors related to the referential integrity of tables in the schema. While generating the PII/SI fields, we can follow notation conventions and generate mock data that is dummy, thus helping comply with the regulations in place.


How do we use contextual data generation for our testing activities?


There is a gamut of products available in the market for data generation - Mockaroo (supports generation in SQL, Delimited files, JSON & Excel), SQL Data Generator by Redgate (As the name suggests used for SQL Server Management Studio), Test database generator by IBM for DB2, Generate Data (MySQL 4 or higher). One of the key products on Contextual Data Generation is from Infosys - Infosys Enterprise Data Privacy Suite, or iEDPS. iEDPS is an intelligent data generation product which caters to a wide range of requirements making life easier for the users in need of generating test data. The inputs needed can be simplified to being as minimal as only schema details of requirement and number of records needed. It contains more than 35 algorithms designed specifically for data generation purposes.


iEDPS is an easy-to-use, high performance, scalable, and cost-effective data privacy and protection solution that automates the data protection and privacy across an enterprise. Loaded with deterministic, selective, dynamic and static masking tools along with the data generation tool, the best part about iEDPS is that it can be deployed on any platform, both On-Premise systems as well as cloud environments for organization-wide usage of the tool in the enterprise and supports all major databases and file systems. Before choosing a data generation tool we should have following things under consideration like data generation methods provided, support for different datatypes, databases and various operating system among many other factors. iEDPS checks most, if not all, of the boxes. Here's a video explaining about iEDPS. More details about iEDPS and its product suite available at iEDPS Microsite.

Author:- Pranay Sharma R

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.