Is your testing organization ready for the big data challenge?
Author: Vasudeva Muralidhar Naidu,Senior Delivery Manager
Big data is gaining popularity across industry segments. From being limited to lab research in niche technology companies to being widely used for commercial purposes; big data has achieved a wider scope of application. Many mainstream organizations, including global banks and insurance organizations, have already started using big data technologies (open source) to store historical data. While this is the first step to value realization, we will soon see this platform being used for processing unstructured data as well.
Are testing organizations ready to test implementation? Many might say the obvious, 'what is the big deal?' Many technologies have evolved in the last few years and testing organizations have built robust testing strategies. This includes mobility, cloud, digital, telematics, to name a few. So why is big data different? The three tables below offer a deeper understanding of whether employing big data is a big deal or not.
Figure 1: What is new in Big Data?
If you look closely, every single point in all three tables are new - input data, data formats, input data types, storage mechanism, processing mechanism, and the software used to extract, process, store and report. A completely new experience that calls for a brand new tester with new technology skills, testing process skills, ability to build new tool sets, and instrumentations.
Considering the above scenario, let us look at the big data process chain and identify the type of testing to be conducted at each stage. This should help us understand realistically what is needed to get ready for big data testing programs.
Source data: The first obvious point is input source format. Unlike RDBMS, the formats can be anything ranging from Twitter feeds, Facebook, Google search, data generated from RFID, digital sensors, mobile phones, cameras, GPS etc. This calls for specific knowledge on data acquisition. The tester should have the ability to validate data acquisition methods and its accuracy.
Aggregate and analyze: Once the data is acquired and aggregated, special algorithms will analyze and mine the data needed based on pattern matching. The tester has to validate the aggregation rules, pattern matching algorithms, and patterns fulfilling reporting needs.
Consume: Once the data is mined and stored, the speed of report generation and accuracy of jobs have to be put to test.
Based on understanding the above, the diagram below describes various types of testing to be carried out.
Figure 2: Big data testing needs
What do testing organizations need to execute these various types of tests?
Testing organizations need significant preparation to face big data challenges. As the entire base is built from scratch, dedicated focus is necessary. The following guidelines can help:
- Testers should possess a variety of new technical and process skills
- Requirements for big data should be focused on end user reports and MapReduce logics
- Testing of extreme information management calls for extreme automation
- Simple excel sheet macros might not work, requires scripting / tools for validation purposes
- Test data management should also be determined based on MapReduce logic
- Test environments scaling factor should be driven by #sources and network bandwidth
- Key for testing team success is in defining how much to test and when to stop
Big data programs are expected to grow and testing will play a major role in ensuring the success of these programs. Testing will help fine-tune the pattern matching algorithms, which in turn will help increase the usefulness of unstructured data. The more prepared you are as a testing organization; the higher the success you will achieve in big data programs.