ISM Research Memorandum
No.
953
Title:
A probabilistic upper bound for the degree of over-fitting to noise in neural network regression
Author(s):
Hagiwara, Katsuyuki (Mie University);
Fukumizu, Kenji (Institute of Statistical Mathematics)
Key words:
multi-layer perceptron; statistical regression; training error; over-fitting; log-likelihood ratio statistics
Abstract:
In the training of layered neural networks, such as multi-layer perceptrons and radial basis functions, over-fitting is a serious problem. This paper investigates the over-fitting to Gaussian noise in neural network regression. Using re-parameterization, a network function is represented as a bounded function g multiplied by a coefficient c. A condition in which the squared sum of the outputs of g at the given inputs is bounded away from a positive constant \delta _n, which changes with the sample size n, is considered. This condition is induced by a restriction on the sizes of the weights. Under this restriction, a probabilistic upper bound for the degree of over-fitting to Gaussian noise is derived. The derivation reveals that the order of the probabilistic upper bound can change depending on \delta _n, thus, on the sizes of the weights. For example, the bound is O(log n/n) for \delta _n = log n^{-\lambda } and O(loglog n/n) for \delta _n=n^{-\lambda }, where \lambda is an arbitrary positive constant. The obtained bound is applied to the analysis of over-fitting behavior for one Gaussian unit, and it is shown that the probability of obtaining an extremely small value for the width parameter in training is close to one when the sample size is large. It is empirically known that the network output in over-fitting has a high curvature. The obtained result provides theoretical evidence for this phenomenon.