Four approaches to big data testing for banks
Author: Surya Prakash G, Delivery Manager
Today's banks are a stark contrast to what they were a few years ago, and tomorrow's banks will operate with newer paradigms as well, due to technological innovations. With each passing day, these financial institutions experience new customer expectations and an increase in interaction through social media and mobility. As a result, banks are changing their IT landscape on priority, which entails implementing big data technologies to process customer data and provide new revenue opportunities. A few examples of such trending technology solutions include fraud and sanctions management, enhanced customer reporting, new payment gateways, customized stocks portfolio-based on searches, and so on.
While the implementation of big data solutions is a complex task, the testing of such implementations offers additional challenges ― mainly due to the volume of test data, infrastructure requirements, newer complex technologies, and different formats of data from multiple sources. Also, banks are moving from traditional warehouse systems to big data technology like Hadoop for cost savings and ease of storing.
As of today, four different kinds of big data implementations are possible, and each require a specific type of testing. However, banks must evaluate the following '3 Vs' of big data before testing applications:
• Volume: A huge amount of data flows through systems, and requires validation for quality
• Velocity: This is the speed at which new data gets generated. Generally, the greater the velocity with which data can be analyzed, the bigger will be the profit for the organization
• Variety: Big data comprises of large data sets which may be structured, semi-structured, or unstructured
The following infographic elaborates on the four different big data implementations and their corresponding testing categories:
The four testing approaches to validate big data:
I. Migration testing: A number of banks are shifting data from traditional data warehouses to big data stores to save on license costs and benefit from big data technology (distributed storage and ease of retrieval). In the case of migration testing, data from the source is compared with the target to ensure that all records are moved. Data quality and visualization testing are also performed on the target database.
II. End-to-end testing: The trends in big data implementations focus on creating new data lakes / data hubs, with big data replacing existing data warehouse systems to store data in different zones for easy retrieval. Such implementations require different testing methods.
In fact, the comprehensive data lake implementations require the following four testing approaches:
- Data ingestion testing: Data from various external sources, like social media, web logs (unstructured), and sourcing systems like RDBMS (structured), are validated for transformation, format changes, masking, etc., to ensure that the right data is getting ingested into the data lake. As a result, data will be validated at every stage of data ingestion
- Data processing testing: Data quality analysis is the second test step to be followed to ensure data integrity and to validate business rules used for transformation. This needs to be performed in the big data store once the data is moved from the source systems
- Data mining testing: Data available in data lakes will be retrieved on the basis of specific business logic to ensure that the right data is filtered and made available for other data stores or relational databases. This takes care of validating the transformation / retrieval query logic
- Data visualization testing: Reports and visualization testing is related to end users, where output is validated against actual business requirements and design. Reports are the basis for many decisions, but are also the critical components of the control framework of the organization. As a result, reports, dashboards, and mobile outputs are validated
III. Reports testing: In this kind of testing, reports which get data from data warehouses are modified to get data from big data stores. Two validations are performed: Comparison of data getting displayed in reports versus data available in big data stores. Reports are visually compared or validated against a predefined format or template designed by users or data architects
IV. Data archival testing: This kind of testing is seen in rare and predominantly big data stores used for storing data for audit and compliance purposes. The data is not processed and is stored 'as is' to ensure that it can be retrieved easily. The validation approach involves source-to-target comparison, where data is validated from source databases with target big data stores.
The testing of big data is gaining momentum in banks, which are adopting big data technologies because they are cheaper, handle more volumes, and process much faster. This demands different ways of testing, where testers must have expertise in varied technologies, formats of data (structured and unstructured), data conversions, etc. The four, testing stages discussed above can be used as a starting point to understand the different validation stages for different kinds of implementations.
Automation is the key to success in big data testing, as manual testing is impractical due to data volume and variety. The automation strategy needs to be planned along with test planning to ensure that automated tools are made available for testers during execution.
As more and more financial institutions implement big data solutions, testers must sharpen their skills to test these complex implementations.