Testing Services provides a platform for QA professionals to discuss and gain insights in to the business value delivered by testing, the best practices and processes that drive it and the emergence of new technologies that will shape the future of this profession.

« October 2017 | Main | January 2018 »

December 13, 2017

An approach for model validation from Functional and Technical perspective

 Author: Kuriakose K. K., Senior Project Manager, Infosys Validation Solutions

Once a model is built and trained using the training data, you need measures to understand the confidence level you can have on the model. In order to gain this, you start using the model in real production scenarios.

In order to validate the model one needs to test the model both from the intended usage perspective in terms of functionality and also validate the models statistical significance from technical perspective.
Functional Relevance Validation:

As part of model validation, the standard approach is to train the model using training data and test it using test data. This approach can only help us make the model fit for the sample used. We also need to ensure that we have some quantitative and objective measures of how well the model will predict on an independent sample of similar data in order to determine its degree of generalizability. 

We need to be able to clearly define and validate on how various variables are linked to predictor and how these are in turn helping us determine the Stability of the parameters. Also, we need to assess what are the variables and scenarios that can have an impact on the variables which determine the predictor. Understanding these elements is the key, as this will help one understand the limitations and boundaries of the model.

We also need to have regular checks post deployment of model and also check the variances in predictive power which helps us determine Model's significance. Based on the prediction power drop/improvement we can always determine the relevance of the model in use. This will always help us to build and maintain generalized models for our problem statement which are not sample or time period specific.

Technical Relevance Validation:

As part of this, we deal with means to measure the quality of a fit of a model and to evaluate its performance so that we have an acceptable fitting model.

There are various approaches for splitting the data between fitting and validation of data. Based on the available data one needs to split it between training (Fitting) and validation. Once we have used the training sample to fit the model, then we can go ahead and take the fitted model and evaluate model's performance against the validation sample. 

We can use discrimination technique to clearly distinguish between positive and negative outcomes. A models Sensitivity (Measure of the percentage of positive subjects which are classified as positive) and Specificity (Measure of the percentage of negative subjects which are classified as negative) helps us determine the tradeoff. Sensitivity is inversely proportion to Specificity i.e. increase in sensitivity will decrease specificity and vice versa. Also we can plot ROC curve by having true positive rate at Y axis and true false rate at X axis. This helps us depict relative tradeoffs between true positives and false positives.

Conclusion:

It is important to validate the credibility of model and evaluate model output against the actual data by using various measures discussed. Also, at the same time, it is important to have all input parameters validated in terms of their significance to the model and most important aspect is always having the significance of the model validated against the problem we are trying to solve. Always remember you are designing and validating your system not on somebody's opinion but based on FACTS!!!

Please reach out to me to share your thoughts.