###
ON THE ESTIMATION OF PREDICTION ERRORS

IN LINEAR REGRESSION MODELS

###
PING ZHANG

*Department of Statistics, The Wharton School of the University of Pennsylvania,*

3000 Steinberg Hall-Dietrich Hall, Philadelphia, PA 19104-6302, U.S.A.
(Received September 9, 1991; revised March 6, 1992)

**Abstract.**
Estimating the prediction error is a common practice
in the
statistical literature. Under a linear regression model, let *e* be
the conditional prediction error and ^{^}*e* be its estimate. We use
*rho*(^{^}*e*, *e*), the correlation coefficient between *e* and
^{^}*e*, to measure the performance of a particular estimation method.
Reasons are given why correlation is chosen over the more popular
mean squared error loss.
The main results of this paper conclude that it is generally not possible
to obtain good estimates of the prediction error. In particular, we
show that *rho*(^{^}*e*, *e*) = *O*(*n*^{-1/2}) when *n* \rightarrow \infty.
When the sample size
is small, we argue that high values of *rho*(^{^}*e*,*e*) can be
achieved only when the residual error distribution has very heavy
tails and when no outlier presents in the data. Finally, we show that
in order for *rho*(^{^}*e*, *e*) to be bounded away from zero
asymptotically, ^{^}*e* has to be biased.

*Key words and phrases*:
Conditional prediction error,
correlation.

**Source**
( TeX ,
DVI ,
PS )