Allure of Cloud may fade away without proper data validation
Naju D. Mohan, Delivery
Manager, Data Services, Infosys Validation Solutions
The need to validate integrity,
correctness and completeness of data, residing in the cloud is increasing every
second with the penetration of mobile devices and the inter connection of
computing devices through internet. Cloud seems to be establishing itself as the
best possible alternative to meet these data storage and processing demands. Data
management capabilities of traditional data stores are revamped to meet the
demands for huge volume and variety of data in cloud storage. This calls for
new testing techniques and tools for ensuring 100% data validation.
The mad rush for capturing data
generated by customers, opportunities to improve business decisions through
data driven insights and the spurt in data storage costs are some of the
driving factors for companies to move towards cloud. The companies have now started
thinking about when and how to migrate to cloud rather than whether to move to
cloud. The adoption of cloud by companies would be based on multiple factors.
Presence of a proper QA strategy for data validation during cloud migration
would be a primary deciding factor, which would help companies to retain a
sustainable and competitive edge.
Triggers for cloud
adoption and big data validation needs
·
Legacy
modernization
Digital
transformation is pushing companies to move away from legacy applications, most
of which lack the agility to support the modern day consumer demands. A lot of
these legacy applications require daily firefighting just to keep the business
functioning. Once the companies decide to make a cloud transition, they might
go for a hybrid strategy and retain some of the existing functionality with the
legacy application and migrate some to the cloud. They may migrate only that
functionality to the cloud, which requires interoperability with other cloud
applications or the functionality which requires a total overhaul. This would
make testing very tricky for legacy to cloud migration and the primary focus of
data validation should be on data integrity testing.
o
Data
integrity testing should verify the compatibility of existing data with new
hardware, operating system and the new interfaces which are implemented for
cloud integration
o
Data
integrity should be ensured by doing unauthorized data access validations
o
The
entire data and data files should be tested for integrity, as a subsystem
within the old application functionality
·
Data
Warehouse
It requires a total mind shift for
companies to move the data stored within the walls of their organization in
traditional data warehouse, to data warehouse on cloud, due to security and
data migration concerns. Today most of the leading data management companies
provide options for data warehouse on cloud like Amazon's Redshift, Microsoft's
Azure SQL Data Warehouse, Teradata's Teradata Cloud and IBM's dashDB. A data
warehouse hosted on cloud helps companies to reduce the setup and maintenance
efforts compared to an on-premise data warehouse. The ease of distribution of
data to geographically widespread departments within the organization and
ability to derive quick analytical insights also prompt companies to migrate
their on-premise data warehouse to cloud. All enterprises who have adopted a
cloud infrastructure for hosting their data warehouse, would definitely require
a testing strategy that takes care of validating the data
movements to and from cloud.
o
Data
migration testing during movement to a cloud data warehouse should try to
identify the business logic implemented in stored procedures used in the legacy
data stores. These should be converted into business rules in test cases and
used to validate the complex data transformations.
o
Data
Analytics and Visualization testing on cloud need to consider data integration
nitty-gritties between on-premise and cloud data stores
o
Data
Ingestion testing becomes critical as it requires to validate the merging of
unstructured incoming data along with structured data for deriving valuable
insights
·
Machine
Learning and Analytics
Usage
of Machine Learning to analyze data, find patterns and make predictions is
fueling the race to store data which is getting generated every millisecond.
This increases the demand for data stores to store this huge variety and volume
of data. Enterprises move transactional data to cloud to overcome the
challenges associated with collecting, analyzing and storing big data. The big
players on cloud like Google, AWS and Microsoft provide cloud-based Machine
Learning solutions. The need for data integration between cloud and on-premise
becomes acute to get a complete picture of data patterns and utilize the
machine learning solutions. Innovative test strategies have to be devised to
meet the needs for machine learning and analytics.
o
Proper
data quality testing has to be done to ensure the completeness and correctness
of data before determining patterns
o
Artificial
Intelligence based validation techniques have to be deployed to validate
predictive models
Conclusion
Cloud adoption is more than just
technology upgrade. It requires detailed planning and a phased approach to
prioritize the business use cases and the associated functionality for
migration to cloud. A few important points that would come in handy while handling
cloud data validations are listed below
- · Applications with higher regulatory and data privacy needs would need multiple iterations of testing and this additional time for testing needs to be factored in during the planning phase.
- · Automated testing utilities with appropriate connectors for handling validation of special data file formats specific to cloud
- · Appropriate validation strategy for integrated business processes spread across on premise and cloud