COUNTEREXAMPLES TO PARSIMONY AND BIC

DAVID F. FINDLEY

Statistical Research Division, U.S. Bureau of the Census, Washington, D.C. 20233, U.S.A.
Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan

(Received February 6, 1990; revised April 1, 1991)

Abstract. Suppose that the log-likelihood-ratio sequence of two models with different numbers of estimated parameters is bounded in probability, without necessarily having a chi-square limiting distribution. Then BIC and all other related ``consistent'' model selection criteria, meaning those which penalize the number of estimated parameters with a weight which becomes infinite with the sample size, will, with asymptotic probability 1, select the model having fewer parameters. This note presents examples of nested and non-nested regression model pairs for which the likelihood-ratio sequence is bounded in probability and which have the property that the model in each pair with more estimated parameters has better predictive properties, for an independent replicate of the observed data, than the model with fewer parameters. Our second example also shows how a one-dimensional regressor can overfit the data used for estimation in comparison to the fit of a two-dimensional regressor.

Key words and phrases: Model selection, linear regression, misspecified models, AIC, BIC, MDL, Hannan-Quinn criterion, overfitting.

Source ( TeX , DVI , PS )