When and How to Use "Best Fit Model" in Your Statistical Forecasting Suite?
Statistical forecasting is a bit complex and "not so easy" to understand feature. More the number of algorithms more the confusion it creates in the mind of end users. To tackle this problem almost every demand planning suite has a "Best Fit Algorithm". This algorithm is supposed to be "Superset" of all algorithms. If planner applies this algorithm then in the background all possible combinations of all algorithms will be run and one that will give "Best Fit" in the history data will be chosen and forecast generated based on that algorithm. Many call this algorithm as "algorithm for dummies". Generally it is observed that those who do not understand the statistical part of algorithm use this option more to save the trouble. However in 9 out of 10 cases this algorithm generates forecast which is wrong and cannot be used for planning purposes. Why does this happen?
To unravel this question let us break down what this Best Fit Algorithm do in the background. First of all it generates all possible parameter combination for each algorithm on offer. For example, if Holt Winter Algorithm is on offer then it creates all possible combinations of Alpha, Beta and Gamma values. For those who do not know - Alpha, Beta and Gamma are parameters used by Holt Winters Algorithm. Values of these parameters should be between 0 and 1. In SAP APO this algorithm is on offer and there is additional restriction that values off Alpha, Beta and Gamma can be only in multiples of 0.05, so with this restriction each of them can only take 20 possible values starting from 0.05, 0.10, 0.15 and so on. So there will be 20*20*20 (8000) possible combinations of alpha, beta and gamma values. In some other suites number of combinations can reach one million. Best Fit will generate forecast for all these combinations and will select the one that has lowest "Forecasting Error". Now let us unravel what is the definition of "Forecasting Error"?
All these forecasting suites offer all standard Forecast Error options like MAPE, MAD, RMSE etc. User is given the option of using one of these Error Measures for selecting Best Fit Algorithm. For example, if user chooses MAPE then Best Fit Algorithm will go through all combinations and will select the one that has lowest MAPE. Looks great but as famous saying goes "Devil Lies in the Detail". How is this MAPE calculated?
For very least calculating MAPE require a forecast number and history number. It takes difference of forecast and history number, take the absolute value of difference, divide it by history number and multiply it by 100. Best Fit Algorithm works the following way. Take for example Holt Winters model, it requires minimum 27 history data points to generate first forecast. Hence first 27 data points in history are used to generate the forecast for 28th data point in history. This forecast and history for 28th data point is used for calculating MAPE. Similarly first 28 data points are used to generate the forecast for 29th data point, first 29 data points are used to generate forecast for 30th data point and so on. So if you have 72 data point in the history effectively you will have 45 (72 -27) data points for which you will have both forecast and history. MAPE is calculated based on these 45 data points. And as pointed out previously, combination with lowest MAPE will be selected by Best Fit Algorithm. I see following problems with above selection mechanism of Best Fit
- Quality of forecast in initial periods will be absolutely poor as it is based on bare minimum history. As pointed out above only first 27 point history will be used to generate forecast for 28th point. Since number of data point on which this forecast is based is absolutely bare minimum, quality of this forecast will be very poor. Consequently MAPE calculated based on this value will be erroneous.
- Every error measure chosen is susceptible to problem. For example, MAPE calculation fails if history data has zero values or values very close to zero. RMSE fails if history data has even one outlier data point. MAD fails if standard deviation of time series is high. So if history data has these problems then error measure chosen will fail and consequently Best Fit model selection based on these measure will fail.
- Sectional fitment of model is another problem. For example, if history data has 72 data points, it may so happen that model selected by best fit will fit the best in initial part of the history but will fail in recent history. However overall MAPE can still be low for this combination hence Best Fit will still chose this combination.
- Sensitivity of best fit model is very low. Every month new history data point will be added. It may so happen that best fit model last month will be discarded and new one will be chosen. This happens because of problems in error measure calculation mentioned in point 2.
With all above problems use of best fit model becomes very complicated and require deep understanding of underlying statistical principles. It is not the algorithm for dummies; on the contrary it is the algorithm to be used by statistical experts. So next time you are not happy with Best Fit algorithm you know the reason - You have to ramp up your statistical knowledge and understand selection mechanism used by Best Fit better.



Comments
Very insightful analysis Nikhil.
Posted by: Ravi | December 27, 2010 9:15 PM
Nikhill - interesting article, for the record, Oracle Demantra is NOT "best fit" it uses a technique called Bayesian Markov. Thought I should clarify, the first paragraph seems to lump it in with the other "best fit" products.
Posted by: Timothy Hughes | January 5, 2011 7:41 AM
While Demantra is not a best fit "per se", it does fit models during the first step. It then weights the various models that were fit to get an averaged forecast. The point here is that it is not modeling, but fitting.
Posted by: Tom Reilly | February 14, 2012 3:43 PM