Testing Services provides a platform for QA professionals to discuss and gain insights in to the business value delivered by testing, the best practices and processes that drive it and the emergence of new technologies that will shape the future of this profession.

« Predictive Analytics Changing QA | Main | Golden rules for large migration »

Role of Validation in Data Virtualization

Author: Kuriakose KK, Senior Project Manager

How can I see the big picture and take an insightful decision with attention to details now?

Jack, the CEO of a retail organization with stores across the world, is meeting his leadership team to discuss disturbing results of the Black Friday sale. He enquires about reasons behind why they were unable to meet their targets and the reason is promptly answered by his leaders as missed sales, delayed shipping, shipping errors, overproduction, sales teams not selling where market demand exists, higher inventory, etc. Jack is disturbed by these answers, and on further probing understands most of these are judgment errors. 

A judgment error is not something he can go back and explain to his shareholders. Returning the retailer brand to growth has been his top priority. Jack has the best team in the market and his product line is superb; so what is going wrong? He gets into further deep-dive sessions with his leaders and understands that everyone has a different view of things. Even though his organization has consolidated information of customers in its data warehouse, it only has key attributes relating to customers visiting their stores. Information regarding online customers, however, is spread across different systems segregated based on the brand, the campaigns, etc. The way information is stored and leveraged for stores and online sales is different. There is no cross-selling happening today, which in itself can increase the company's sales by at least 8-10%. Similar issues exist when it comes to core functions, like products, sales, and inventory. Furthermore, some lines of business have stale or outdated data.

Jack sees the need for a common view of all enterprise information, across all functions throughout the organization. Jack is also aware that they have been consistently investing in BI projects, so that they can integrate information from multiple applications across different functions, which is a very time consuming process. He is also aware of the huge monthly maintenance expenses for the existing BI system. He asks Tim to look into the matter and come up with reasons for why their BI reports are unable to provide them with a common view of the system when it was designed for the very purpose.

Tim, the chief architect, is well-known in the organization for solving complex problems with simple and economical solution. After a week, Tim comes up with the following reasons:

  • No integration with social media data and insights
  • Even though customer feedback and surveys are recorded in the system, they are not integrated in a manner that the business can leverage when they need to
  • No centralized view to pull all relevant data corresponding to a core data in demand
  • No way of changing the source data on demand, without staging and post-processing
  • No just-in-time data availability for the business's real-time or near-real-time needs

Tim summarizes by saying that the current challenges are with data complexity, disparate data structures, multiple locations, latency, and completeness.

How to get a consolidated view?

Jack requires a solution that can seamlessly bring the abstracted data out of the complex data architecture, exposing a common data model layer which can then adapt as per his needs. He also needs:

  • A business representation of data in this data model, enabling the business to become partially independent
  • The ability to carry out certain data integrations independently
  • Quick availability of data, when needed, and in the required format

Jack's needs can be addressed with the help of data virtualization, which employs a layered architecture, using a combination of physical and virtual data stores, depending on parameters like performance, storage availability, etc. Most leading data virtualization solution providers in the industry, such as Denodo, Cisco, SAS, IBM, and Informatica, use data integration techniques to ensure consistent data access, supporting complex disparate data sources and structures across various locations.

In today's data-centric world, where having the right data at the right time is key to successful decision-making, data virtualization addresses four key challenges:

Speed: Traditional methods of receiving data in a specific format have long cycle times, in terms of raising a CR with IT teams, followed by requirement gathering, impact analysis, integration, unit testing, system testing, production deployment, etc. Data virtualization, however, can integrate data from disparate data sources and formats into a single data layer, thus providing a unified view with limited / no data replication. 

Quality: Data virtualization provides users access to high-quality data through functions like data standardization, cleansing, transformation, enrichment, and validation.

Control: With data virtualization, data doesn't have to be replicated across instances and can instead be maintained in a single repository, which helps users maintain better control over it.

Cost: Data virtualization also helps organizations move away from the practice of maintaining multiple copies of the same data, thus enabling businesses to become more independent, reduce cycle time for report generation, and thereby bring down costs.

Conquering the challenges of data virtualization testing

Tim knows very well that no one will accept his solution without validation from the QA team. He reaches out to Mike, who heads the QA transformation and consulting team. To Tim's surprise, Mike already has a solution ready in place and informs him that this is very well an extension of a strategy that he is using in the BI world.

According to Mike, data virtualization is great for business users as it hides the complexity involved in generating complex reports. However, validating it is complex and can be expensive. Thus, he suggests that the following parameters be taken into consideration:

Parameters

Requirement

Recommendations

Test strategy

Detailed test strategy based on requirements covering:

  • Data migration testing
  • Integration testing
  • Web services
  • Data virtualization testing
  • Report testing
  • Security testing

Test-driven development: It is very useful for development of complex reports when there are multiple integration checkpoints, so that the user is able to check every component and integration independently, against the final and broken-down business outcome expected

Runtime monitoring of data: This can provide valuable insights to testers in terms of tuning their test cases to match the real needs of the systems

Reliability testing: Based on policies defined for fault models for individual / composite data, services have tests set to validate for scenarios like invalid / unexpected data and timing conditions like deadlock, concurrency, etc.

Regression testing: Efficient regression test cases and testing can help reduce the cost of retesting.

Risk-based testing approach: Helps one decide on the critical components being validated extensively with 100% record validation, which reduces the probability of failure and ensures that business-critical reports are accurate. Higher weightage is given to all components that are business-critical and have high-usage or a high failure rate.

Test planning

Detailed requirement capturing and validation procedure as part of test planning

Develop a tracker-based validation system to check for:

  • Migration of data
  • Integration of data sources
  • Report validations
  • Metric validations
  • Key SLA validations

Skill set

A specialized team of testers with skills in:

  • Data testing, with hands-on experience in disparate data sources
  • Web service testing
  • Performance testing
  • Detailed understanding of business and data flow

Cross-enable team on various skills during the test planning phase, based on recommendations received from the test strategy phase.

Staffing model

Initial heavy-loading of resources to test individual components, followed by a small team at the end specializing in end-to-end testing

A core flex model of staffing to support heavy loading at start.

Test process

By understanding the details of disparate data sources and multiple integration checkpoints, a new set of test process assets are required at every test life cycle stage.

Test process needs to be customized, providing key focus on:

  • Integration checkpoints
  • Test data availability
  • Report visualization

A data analyst, in collaboration with a test process engineer, will help in the development of a strong testing process.

Tooling

A framework that supports a toolset that can support various testing needs of data virtualization testing

Identify tools supporting various needs of a data virtualization and try opting for open source tools that suit your testing suits.

Also work on building automated regression suites that can validate the core business entities.


Mike also recommends various types of testing, such as:

Type of testing

Details

Data acquisition testing

This involves validating the acquisition of data from multiple data sources in different formats. It is a complex validation procedure, as it can also involve semi-structured or unstructured data, along with structured data. As part of testing, various validation checks like data extraction, filtering, completeness, and consistency have to be carried out.

Data migration testing

Migration of data from multiple data sources is covered here. Level of complexity goes up in scenarios where large volumes of data or data transformations are involved.

Data virtualization testing

A separate test sub-network availability replicating the actual implementation, will help in true validation of the system. A check needs to be carried out to validate the support for all required configurations and operating systems at the server and client end. We also need to emulate a global network and validate for scenarios like delays in data availability.

Data quality testing

It is important to validate the quality of the data stored, as it is the base for all business decision-making. Along with traditional methods of data quality, like schema, metadata, look up, format, data structure, pattern, statistical, etc., we also need to check the quality of data in terms of:

  • Its business usage; like whether the data satisfies all the key business rules
  • If the data is transformed and organized in a format that can provide quality information to the business for not just today's need but also for future needs

Data integration testing

Integration and end-to-end testing for validation; such that post the data virtualization implementation, disparate data sources and systems act as one. Validations to be carried out for data completeness in terms of record count checks between source and target, removal of duplicate records after the integration between systems, ability to correctly identify matching records across data sources. Data integrity checks to validate data consistency between source and target, validation for lookups, aggregate, expression transformations.

Report testing

A detailed reporting system, keeping the future needs of the business in mind and validating UI navigation, filters, prompts, data correctness in reports. Also, the report needs to validate various browser compatibility needs.

Security testing

We also need to ascertain that information is only accessible to the people authorized to view the data. Unauthorized access to data can lead to issues like privacy breach, non-compliance of regulations, financial irregularities, litigations, etc.

Performance testing

A core benefit of data virtualization is quicker access to data when it is needed. Hence, all SLAs need to be validated in detail against the actual production load of data. Key metrics like throughput, latency, etc. need to be tracked closely.

Business usability testing

Business entity validation: Validate for accuracy of business entities with all data validation checks satisfying business rules. Involves data checks like duplicate, record format, consistency, accuracy, referential integrity.

Operational accuracy: Accuracy of reports in terms of parent report data tying with the drill-down data and the ability to reconcile with key business metrics.

Take regular feedbacks from the business during SIT, rather than waiting till UAT. This will help you develop systems and reports that do not just meet the technical details, but also ensure that you develop a more business-friendly product.


Business is happy and growing

Jack and his leaders are happy with the solution, as the data virtualization implementation and the QA validations carried out to ensure accuracy of the reports have helped them address data-related challenges in making the right decisions. Jack's business can now:

  • Help new business lines integrate data with existing data warehouse, with limited cost and in a short cycle time
  • Integrate structured, unstructured data
  • Integrate real-time data with an application and a data warehouse
  • Have a 360-degree-view of customers based on data across various systems

 Conclusion

Many organization fail to reap benefits from their diverse data sources due to their reluctance in accepting new trends in the data space. With data virtualization, business users can economically access data from disparate data sources, on a need-basis.  At the same time, a validation procedure is also required, with the right set of strategies, tools, and practices to enable the needs of tomorrow's integration and reporting.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.