Testing Services provides a platform for QA professionals to discuss and gain insights in to the business value delivered by testing, the best practices and processes that drive it and the emergence of new technologies that will shape the future of this profession.

« An approach for model validation from Functional and Technical perspective | Main | Infosys Business Assurance Store - Fast Forward QA with the power of a Million »

Big data validation for a memorable digital customer experience

Naju D. Mohan, Delivery Manager, Data Services, Infosys Validation Solutions

I have often heard from my colleagues, sales team members and sometimes even clients on what and how exactly do you validate big data and data insights? This is not surprising to me, since big data and occasionally even the insights derived from it is a black box for most of them. For a whole lot of people, it is lots and lots of data which traverses across systems, gets churned by algorithms, which most of us have forgotten after our school years and finally displayed using fancy visualizations making it a mysterious world. Let me make a modest attempt to take you through the journey of big data for a common use case, which we have experienced in our daily lives. I shall take a pause at each step of data flow and explain what has to be tested and how we confirm on data quality. 

Testing to the rescue of big data problem in personalized marketing

At this point in time, when customer demands are changing too often and sky being the limit for customer expectations, those companies which capitalize on delivering personalized experience gets a competitive edge. All companies agree on the fact that success of personalization depends on the quality of data being used.


 Test it before you ingest obsolete data into your big data system

The primary source of data for personalized marketing comes from the customer's social media activity, online product reviews, online navigation patterns, post purchase history etc. This data comes in huge volumes, at a very high rate and is most often unstructured or semi structured. Companies receive this data from diverse sources and very often struggle to make sense out of this data. The below activities are carried out as part of testing to ensure sanctity of ingested data.

·         Convert the ingested data which is unstructured or semi structured into a comprehensible format

·         Validate the converted data to ensure data completeness during ingestion

·         Validate the data for data truncation to ensure data integrity is preserved

·         Identify missing customer files and dropped customer records through statistical validation

·         Confirm the validity of customer data received from diverse source systems

Improve your big data processing through appropriate testing techniques

Customer data ingested from various sources has to be processed before analyzing it. This includes cleansing of data to get rid of unwanted details, enrichment of data from other systems like master data management, addition of finer details from transaction history etc. Companies sometimes end up sending personalized messages to unintended customers, choose the wrong channel for personal campaigns etc. These mistakes can be avoided by following the below testing approach.

·         Use combinatorial testing methods for optimized test coverage for handling huge data volume

·         Use match and merge validation techniques for data enrichment

·         Validate big data systems to avoid duplicity in records due to integration from various sources

·         Ensure the validity of data in big data environment by comparing it against source of truth

Manage data relevance through testing, while migrating big data

Higher conversion rates and long term revenue through improved customer retention comes with right insights into the customer data and apt slicing and dicing of data. Data is migrated from big data systems to datamarts or data warehouses to drive better results with customer data. The below points should be kept in mind while validating data migration.

·         Validate and remove outlier data to avoid skewed KPIs and incorrect personalized marketing campaigns

·         Validate the correctness of campaign data

·         Ensure the integrity of customer data from across channels while creating aggregates

·         Verify the conformance of available data for privacy and data compliance regulations


Test data insights and analytical models for retaining the business upper hand

It is evident that price point is no longer a major factor affecting a customer's purchase decision or buying pattern. Personalization is the key to success and when done across channels makes it all the more attractive. The final power of big data collection, processing and analysis lies in its ability to predict and communicate true and meaningful insights. A focus on the below areas during testing will take the quality of data insights to a new level.

·         Validate the visualization report with respect to all required dimensions

·         The format of data on reports and dashboards have to be validated to the minutest detail. A minor error in a decimal can lead to millions of dollars loss

·         Validate that the analytical models provide optimum predictions for the required scenarios



Most personalized marketing may not be 100% effective due to huge data management challenges. So it becomes all the more important to test data during its entire journey, from inception till insights to help companies retain their competitive edge.


Intricacies of Why's and What's of Big data validations explained in a very comprehensible way !!

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Infosys on Twitter