Model Selection (Part I: Model Selection Criteria) Mini Paper Appearing in Winter 2011 Newsletter
Abstract: Model selection is the process of choosing terms for a statistical model that adequately describes or accurately predicts the system under observation. This article (Part 1) and a forthcoming article (Part 2) will discuss model selection in the context of linear statistical models, where the response variable is a continuous variable. Unifying Parts 1&2 is the awareness of a tradeoff between over- and under- fitting the model. Too few terms, and the model is under-fit and thus biased: it misses predictable parts of the data (the signal). Too many terms, and the model is over-fit: unpredictable noise in the data gets modeled as well as the desired signal.
Keywords: model selection - R-squared - Mallow's Cp - AIC - BIC