ON THE ESTIMATION OF PREDICTION ERRORS
IN LINEAR REGRESSION MODELS

PING ZHANG

Department of Statistics, The Wharton School of the University of Pennsylvania,
3000 Steinberg Hall-Dietrich Hall, Philadelphia, PA 19104-6302, U.S.A.

(Received September 9, 1991; revised March 6, 1992)

Abstract.    Estimating the prediction error is a common practice in the statistical literature. Under a linear regression model, let e be the conditional prediction error and ^e be its estimate. We use rho(^e, e), the correlation coefficient between e and ^e, to measure the performance of a particular estimation method. Reasons are given why correlation is chosen over the more popular mean squared error loss. The main results of this paper conclude that it is generally not possible to obtain good estimates of the prediction error. In particular, we show that rho(^e, e) = O(n-1/2) when n \rightarrow \infty. When the sample size is small, we argue that high values of rho(^e,e) can be achieved only when the residual error distribution has very heavy tails and when no outlier presents in the data. Finally, we show that in order for rho(^e, e) to be bounded away from zero asymptotically, ^e has to be biased.

Key words and phrases:    Conditional prediction error, correlation.

Source ( TeX , DVI , PS )