Testing Services provides a platform for QA professionals to discuss and gain insights in to the business value delivered by testing, the best practices and processes that drive it and the emergence of new technologies that will shape the future of this profession.

« July 2015 | Main | September 2015 »

August 21, 2015

Are We Prepared to Manage Tomorrow's Test Data Challenges?

Author: Sunil Dattatray Shidore, Senior Project Manager

As tomorrow's enterprises are embracing latest technology trends including SMAC (Social, Mobile, Analytics, Cloud) and adopting continuous integration and agility, it is imperative to think of more advanced, scalable and innovative ways to manage test data in non-production environments. It can be for development, testing, training, POC, pre-prod purposes. The question really is - have we envisioned the upcoming challenges and complexity in managing test data and are we prepared and empowered with right strategies, methodologies, tools, processes and skilled people in this area?

Following are some of the critical areas that we need to focus:

  • Test data for Big Data Testing and IOT
  • Data Management for cloud based NPE (Non Production Environments)
  • Data on demand and extreme self service
  • Agility in data provisioning
  • Synthetic data creation through virtual services
  • Data virtualization

Big Data and IOT: The key challenges in Big Data Testing and IOT (Internet of Things) are:

  • Complexity of building scalable test environment
  • Realistic/meaningful sample data subsets to test & simulate real life scenarios 

It's practically not feasible for any enterprise to build their own test environment lane which would be capable to test Big Data transformations and support diverse and complex data sources like social media, sensors, and peripheral devices. Getting 'real world' data has its own challenges in terms of data privacy and security of PII (Personally Identifiable Information) and confidential data. 

In order to address these TDM challenges, one should attempt to build the reusable component libraries of 'real-like' data. These data repositories would have seed lists for various data entities across domains which are commonly required for testing. Component libraries would cover both structured and unstructured data sets which can be used in Big Data testing. Synthetic data creation for unstructured data and tools supporting this capability would play important role.

Cloud-based Environments: Data related risks in cloud environment are more serious compared to enterprise environments. Some of the critical risks are -

  • Vulnerability and business risk of exposure of sensitive data,
  • Leak of intellectual property to competitors,
  • Loss of data due to external threats
  • Huge Penalties/Loss of reputation for non-compliance to regulatory standards, Data Tampering, Data theft/data loss impacts brand image, customer confidence etc. 

Following are the key considerations for Cloud based environments to provision data -

  • Requirements and Controls - Regulatory Requirements, Data security controls on Cloud
  • Data Transformation - Data Migration (data in motion) and data storage (data at rest), Data Refresh (Periodic), Data Masking approach, Synthetic data creation
  • Tooling, Process, Infra - Process, Policies and standards, Tooling and Licenses, Infrastructure investment

Self-service and Enablement: As QA functions attained the maturity in the test data management journey, the paradigm shifted from tactical test data management done by silo testing teams to centralized shared service / function is inevitable. This enables self-service data provisioning and further improve testing efficiency. This will enable QA teams to become self-sufficient and provision the necessary data using the service utilities. Test data banks, self-service portals, data on demand features, reusable data mining templates, reusable data creation algorithms play critical role here. Extreme self-service can be implemented through various techniques including Persona based configurable Service Catalog - Pick and Choose, Team Collaboration, Self Service/Context aware Test Data Management etc.

Continuous Integration and Agility: It has been observed from studies and practical experience from QA teams that Agile projects need 80% of test data to be fabricated / augmented / tweaked to test sprint stories compared to 20% in waterfall / BAU releases projects. Key challenge in agile projects is minimal wait time for test data availability. Sprint teams cannot wait for TDM service team to deliver data as per their SLA. Organizations are looking for agility in data provisioning. We have to leverage TDoD (Test Data on Demand) capabilities of TDM tools. Following are the examples of effective tools and harnesses that enhance agility of data provisioning: Reusable gold copies of data sets, In house automation tools for data creation/ augmentation, Self-service utilities, File export/import tools.

Synthetic Data: One of the key challenges in synthetic data creation even using industry TDM tools is to configure complex business logic and transformation rules on specific data attributes. Understanding data models, data integrity and replication of complex rules sometimes becomes one of the toughest things to resolve. Here we can leverage virtual services which create test data as per business logic. TDM teams are trying to adopt hybrid approach of using TDM tools for data fabrication of customer demographics data and virtual services for synthetic generation of customer keys / identifiers that requires complex business logic to create real data. Also reusable component libraries of domain specific data entities and seed lists for common data entities would be very helpful tools to leverage.

Database Virtualization: Physical copies or subsets of production data consume huge space when loaded to test environments and multiple test environments need this test data sets to cater to various types of testing like system testing / integration testing/ performance testing / user acceptable testing / sociability testing etc. Instead of having physical copies of data in each test environment, database virtualization technique will enables creation of multiple virtual copies of databases / schemas from one base physical copy. For example 2 TB of DB schema in UAT environment can be leveraged to create multiple virtual copies which can be used in ST/SIT/PT environment lanes which otherwise could have consumed multi-fold DB space. 


IT development and Testing is fundamentally getting reshaped and transformed with adoption of continuous integration (DevOps) and Agile methodology. It has become an absolute need of an hour to assess our preparedness in managing test data for tomorrow's enterprises which are adopting advanced technologies. Big Data and IOT already hitting the ground ...Cloud based environments being adopted to leverage LAAS and PaaS services... Data privacy and regulatory norms mandated and increased demand of synthetic and virtual data .... Optimization of Opex in managing environment and data .....

We must be prepared with right set of strategies, tools and practices so as to enable tomorrow's enterprises manage test data for the projects operating in continuous integration and agile environment.

August 17, 2015

Extreme Automation - The Need for Today and Tomorrow

Author: Vasudeva Muralidhar Naidu,Senior Delivery Manager

We have read about the success of the 'New Horizon Spacecraft', and its incredible journey to Planet Pluto. This is extreme engineering and pushing human limits to the edge. Similarly, when we hear about the automobile industry and the fact that one additional car is getting assembled every 6 minutes, we are quite amazed at the level of automation that has been achieved.

In recent times, we are realizing the need for similar automation in software testing as well. Software engineering is pushing its limits in every walk of life, resulting in more interfacing applications, more interfacing devices, abilities to store and retrieve huge volumes of data, abilities to build hardware to host software 100X faster and many more. This fast-paced software engineering advancement is posing challenges to software engineers to build an ecosystem which will enable rapid prototyping & design, Agile development & testing, 100% automated deployment. What this means to the testing community is that they not only have to maximize the level of automation that could be achieved in regression testing or functional test execution or virtualization or test data, but they need to maximize this across all life cycle stages of software engineering, development and more importantly in an integrated fashion. 

The challenges that have prevented the tightly integrated life cycle automation, also termed "Extreme Automation," from becoming a reality are software technology incompatibilities, adopters to communicate and the inability to simulate software behavior under test in a cost effective way.

While approaching test automation in the past, the testing community had always been challenged by the term "Return on Investment" and the biggest barrier was funding. Automation tool set was expensive and skills were not easily available. In today's drive towards "Extreme Automation", test automation is not optional anymore. The new age technology adoption and using them for real-life application deployment is possible only with "Extreme Automation". Hence the testing community must move away from ROI as the business case for automation has become imperative.

Now the challenge that lies ­ahead of testing community is to find the answer for "How do we achieve Extreme Automation?" Whom do we collaborate with? I have few recommendations to share.

Test Phase

Extreme Automation Needs


Functional automation


(Our traditional sweat spot - Both regression and system test)

·   Need for a single automation framework to work across disparate technologies.

·   It should have ability to run automated script with minimal lead time with code drops.

·   It should have the ability to automate web based applications, middleware applications and backend tests.

Open source tools like Gherkins, Cucumber, Selenium has made this possible. There are several commercial tools which allow you achieve seamless automation allowing you to work with integration tools like Jenkins, script-less automation eliminating skill needs and extensive maintenance effort.

Static Testing/Effective automated Unit tests

·   Checklist and tool drive technical requirements and design review.

·   Identification of requirement and design defects before coding.

·   Maximize automation of unit tests.

Use of standard static testing checklists, open source tools like JMeter, build hardness customized to your application unit test needs.

Test Data

·   Test Data sub setting and provisioning infrastructure readily available.

·   Synthetic data generation rules established.

Implement tools which will make this automation possible and eliminate the wait-time for test data.


Virtualize all possible applications, databases, hardware to make test automation work seamlessly.

ROI in virtualization is not a challenge anymore. There are several initial challenges which need to be addressed with senior management sponsorship to make virtualization a reality.

Build and Deployment automation

Adoption of DevOps.

Automate your release process.

Achieve continuous integration for deployment automation.



While the recommendations in the above table look straight forward and known, the implementation will make all the difference and help achieve extreme automation. The first objective is to understand the needs, believe in its possibility and push towards implementation. The key post implementation is to execute tests in an automated fashion across all integration layers and technologies. Building infrastructure which will enable the same is even more critical. Future technology adoption is possible only with extreme automation and it is important to get ready for the same today.

August 3, 2015

Validation cannot be wished away till Magic takes over Big Data Analytics | Infosys

Author: Naju D. Mohan, Delivery Manager, Data Services, Independent Validation Solutions

It has dawned on us that big data is indeed a big deal. Lately, its growth has accelerated due to the digitalization of almost everything around us, whether it is organization, industry or economy. It seems evident that the future is going to be governed by big data. But don't be too enthusiastic to jump onto the big data bandwagon and start collecting data for the sake of it. We are sure it will end up in a huge digital graveyard.

All of us are emitting digital data every second, from birth to death, and this data can be stored, analyzed and processed for value extraction. It has a big impact across all industries and influences each one of us in more ways than we can ever imagine. Big data helps to drive productivity across acres of farmland used in agriculture, uses patient data diligently in healthcare to improve quality of life, improves personalized experience through customer analytics, leads to smarter law enforcement and extends support to nobler causes like prevention of human trafficking.

Every organization must strive to derive value from data to run their businesses effectively. This requires a clear understanding of an organization's objectives, a clear goal and a well thought-out plan about utilizing this data to achieve its objectives. Unlike wine, data becomes stale with age. This makes it imperative for organizations to have tools, processes and appropriately skilled people to turn this data into actionable insights before it turns obsolete. Even though it is common knowledge that machine learning algorithms and artificial intelligence programs are automating more and more steps in big data analytics, humans are required at every stage to take decisions about how the data has to be stored, what has to be transformed, and how it has to be visualized.

To err is human, so please don't forget to validate 

In my opinion, validation adds the famous 4th V, which is 'veracity' to big data. Big data would have remained the troublesome '3Vs: volume, velocity and variety,' if there was no holistic validation strategy, an automated validation approach and end-to-end test data governance. 


Figure 1: Importance of Validation

Big data validation isn't a crystal stair

Big data validation isn't a smooth and easy phase in the big data life cycle. Rather, it comes with its own unique challenges. The big data technology landscape is complex and continuously evolving. It has to be capable of supporting various types of data like online transaction processing, package application data, analytical, sensor data and others. The testers need to be on top of technology changes and come up with means to access data, and convert it into a format which can be verified and reported. Replicating a QA environment for big data implementation, which supports diverse data integration across systems, is a challenge that needs to be overcome by adopting newer technologies like virtualization and cloud. There is no proper testing tool available in the market which covers the entire landscape of big data implementations. This increases the burden on the tester to ensure proper test coverage, reduce test cycles through automation and prevent defect leakage.

You needn't be afraid of big data chaos, if testers have mastered the right skills

Bad analysis and poor decisions, due to data quality issues, have serious consequences for the business. The testers need to become adept at extreme scripting to write Pig Latin, Shell Script and Java utilities to tame big data and get valuable insights. They need to develop analytical skills to validate the patterns used in data science use cases, for data manipulations in the big data system. Selection of appropriate test scenarios to stop testing at the right time requires a deep domain understanding and clear appreciation of the business objectives of the client. In short, the testers have to believe in themselves and acquire these new skills to provide quality big data which drives the modern world.


Most organizations accept the fact that data quality is a major barrier for going ahead with their big data programs. They don't have enough confidence that the data mined from their huge stores of raw data would provide the right meaning for their business growth. Inaccurate data is definitely going to hurt business and hence, it is the right time to equip our testing organization, to tide over the risks associated with big data implementations. Given the fact that technology alone is not going to help us realize our investments in big data, we need people with specialized skills, who can validate the information derived from the ocean of raw data and act as our companion in our journey towards richer, real-time and respectful insights. 

"In God we trust. All others must bring data." - W. Edwards Deming

Three Stages of Functional Testing 3Vs of big data

Author: Surya Prakash G, Group Project Manager

By now, everyone has heard of big data. These two words are heard widely in every IT organization and across different industry verticals. What is needed, however, is a clear understanding of what big data means, and how big data can be implemented in day-to-day businesses. The concept of big data refers to a huge amount of data, petabytes of data and huge mountains of data. With ongoing technology changes, data forms an important input for making meaningful decisions.

When the data is presented in large numbers, it poses a number of challenges in testing. Data includes less structured formats (website links, emails, Twitter responses, pictures/images, written text on various platforms) which make its analysis more difficult.

Three Vs of big data needs to be kept in mind when you are validating any big data application

  1. Volume:  Huge amount of data flows through systems and is to be tested and validated for its quality
  2. Velocity: This is the speed at which new data is getting generated, generally when the velocity with which data can be analyzed is greater, the bigger is the profit for an organization
  3. Variety: Big data comprises large data sets - which may be structured, semi-structured or unstructured. 

Three Stages of functional Validation of 3 Vs

Many organizations are finding it difficult to define a robust testing strategy and set up an optimal test environment for big data. Big data involves the processing of a huge volume of structured / unstructured data across different nodes, using frameworks like "Map-reduce", and scripting languages like "Hive" and "Pig". Traditional testing approaches on Hadoop are based upon sample data record sets, which is fine for unit testing activities. However, the challenge comes in determining how to validate an entire data set consisting of millions, and even billions of records.

Three stages of functional testing of big data: 


Figure 1: Three stages of big data testing

To successfully test big data analytics application - the test strategy should include the following testing considerations. 

1. Data Extraction Testing

Data from various external sources like social media, web logs (unstructured) and sourcing systems such as RDBMS (structured) should be validated to ensure that proper data is pulled into the big data store (Ex: Hadoop system). The data should be compared from source (unstructured) to big data store (structured). This can be achieved by following two specific Test approaches described below:

  • Comparing source data with data landed onto big data store to ensure they match.
  • Validating business rules to transform data (Map-reduce validation) - This is similar to data warehousing testing, wherein a tester verifies that the business rules are applied on the data. However, in this case, there is a slight difference in the test approach as big data store should be tested for volume, variety and velocity.

2. Data Quality Analysis

Data quality analysis is the second test step followed after data extraction testing. This is performed in big data store, once the data is moved from the source systems. The data is measured for: 

  • Referential integrity checks
  • Constraints check
  • Metadata analysis
  • Statistical analysis
  • Data duplication check
  • Data correctness / consistency check

As part of the test, the approach to verify data quality, sample tables and small amounts of data are copied to temporary tables and validations need to be performed on the minimal set. These tests are applied on the sample data:

  • Deletion of parent record to check if child records are getting deleted, to verify referential integrity checks
  • Validation of all foreign and primary key constraints of the tables
  • Metadata analysis check to find the metadata variables by checking all the connections between metadata and actual records
  • Data duplication check by inserting similar records in the table, when there are unique key constraints
  • Data correctness or data integrity checks by insertion of alphabets into a record, where it only accepts numbers.
3. Reports and Visualization Testing

Reports and visualization testing, forms the end user part of the testing where output is validated against actual business requirements and design. Reports are the basis for many decisions, but are also critical components of the control framework of the organization.

The reports, dashboards and mobile outputs are validated using two approaches:

I.  Visualization Approach

In this approach, the output is visually compared or validated against a predefined format or templates, designed by users or data architects.

II.  Attributes validation

In this approach, the attributes and metrics, which are part of the reports, are validated and checked for correctness. 


Testing big data is a challenge and there needs to be a clear test strategy in place to validate the 3 Vs of big data. The test stages provided above can be used as a starting point to understand different validation stages and ensure data is tested as early as possible in the data workflow.

As more and more organizations move into big data implementations, testers need to start thinking on various strategies to test these complex implementations. 

Balancing the Risk and Cost of Testing

Author: Gaurav Singla,Technical Test Lead

A lot of things about banking software hinge on how and when it might fail and what impact that will create.

This drives all banks to invest heavily in testing projects. Traditionally, banks have been involved in testing software modules from end-to-end and in totality, which calls for large resources. Even then, testing programs are not foolproof, often detecting minor issues while overlooking critical ones that might even dent the bank's image among its customers.

So, the decision on what to test and how far, is a tricky one. However, experience shows that a risk-based testing approach is well suited to banks' testing needs, as well as their pockets.

While the technicalities of banking software testing, including automation testing, performance testing, load testing etc. appear complex, a relatively simple risk-based testing approach may be employed to identify and prioritize the areas of testing focus.

Risk-based testing follows a simple principle: It determines the impact of software failure on the bank's business, and recommends that project managers carefully attend to those modules where there is high business risk.

The next thing is to identify the most critical business areas of the bank. It seems logical to pinpoint the business areas earning the highest revenue, but that might not actually be the case. Consider these situations:

  • A module/sub-module furnishing wrong information to the central banking agencies of concerned geographies
  • The damage to the bank's image when retail customers are updated with wrong information over Internet banking (or any other channel)
  • Long queuing time at ATMs due to poor performance testing
  • High downtime of banking software across branches.

In such situations, risk-based testing gives the best results by according highest priority to projects involving regulatory reporting and compliance. Similarly, high priority is given to those modules/ sub-modules of bank software, which impact customer communication across unassisted channels like Internet / mobile banking.

On the other hand, lower priority may be given to software modules which fetch considerable revenues, but are low on risk. 

Testing is generally executed by a dedicated testing team; however, risk-based testing calls for brainstorming and discussions with business / marketing to decide what is to be tested and on what priority.

To sum up, risk-based testing is that tool, which when used effectively after involving the concerned people in marketing/ auditing/ reporting etc., is certain to bring down a bank's testing costs.