Just imagine this, you've recently completed a project on a new predictive model for your organization which fantastically predicts >90% of your responders (to promotions) in your huge data-set. You now plan to use it to forecast the customers most likely to respond to your upcoming promotion. On the road to choosing what you would call ‘The Predictive Model,' you reject many other potential models proposed to you by your analytics service provider simply because they do not explain the consumer behavior as well as ‘The Predictive Model." Well, it makes sense.
But now when you use the chosen model to predict responders to your latest promotion, you find that only 30-40% of expected responders actually responded. You wonder what went wrong! One thing that you probably did not think about is what if many of the potential responders were misclassified as non-responders and you never bothered to approach them? Do you remember checking for something called MDL or BMS before choosing the model?
It is very likely that you’ve fallen for the ‘Goodness of Fit Trap’!
Goodness of Fit is a very important measurement for model selection, but it is not the only measure. Managers who end up choosing their predictive models based only on how well it is able to predict the responders in the current dataset end up choosing the (probably) statistically best model, but not the right, managerially-most-useful model.
The two major, misleading drivers of higher goodness of fit are the numbers of parameters used in the model and the functional form of the model. Adding more variables is not always good for capturing the underlying generator of the data. Similarly, a more complicated, difficult-to-understand functional form with all the jazzy looking complex mathematical functions may not be the best. Remember ‘Occam’s Razor’ from your science class? Mathematically, one can increase the predictive power of the current dataset just by increasing the number of variables used and by complicating the functional form. But what in fact one is doing is not just fitting the data, but also fitting the noise present in the data!
What is more important is that the predictive model chosen should be good at predicting the future, even though it may not be best in explaining the past. Therefore, it is very important that the manager looks at the out-of-dataset prediction rather than the within-dataset explanation.
But you would wonder, how can I possibly look into the future and know which model will be good for prediction? Fortunately, there are many statistical measures that you can look at to get an estimate of the future prediction. AIC, BIC, CV, MDL are all you need to ask your analytics service provider. You don’t have to look at all of them; you can just choose any one or two. AIC, or Akaike Information Criteria, and BIC, or Schwarz Information Criteria (formerly Bayesian Information Criteria), both penalize the model for adding additional parameters, thereby reducing out-of-data forecasting error. You should aim at choosing the models with the lowest values of AIC and BIC. Most of the time they move in the same direction, but in case they don’t I prefer BIC, because it also takes the sample size into consideration. The drawback with this approach is that although both check the number of parameters, both fail to penalize the model for functional complexity.
CV, or cross-validation, is the easiest to use and understand. It tries the competing models on some other dataset to see which gives the best prediction on the test dataset. Usually modelers break the data-set into two samples and use a ‘training’ set for modeling and a ‘validation/ test’ set for cross-validating. This works well, but it is unreliable if your dataset is small. It should be used in conjunction with other measures.
MDL (Minimum Description Length) is the best of all, but it is difficult to calculate. It is best because it also accounts for the functional form of the model in addition to the number of parameters used. It is difficult to calculate because it requires advanced integral techniques. This calculation is very resource intensive if there are many parameters in the model. Most of the common statistical packages don’t provide the MDL. It is likely that your analytics service provider will have to calculate it for you.
In a nutshell, it is ‘nice to know’ that your model explains your existing data, but you ‘need to know’ how generalizable your model is in terms of out-of-sample predictions. From a manager’s point of view, actionability is everything!