Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime
【概要】
Deep neural networks with feature learning have shown surprising generalization performance in high dimensional settings, but it has not been fully understood how and when they enjoy the benefit of feature learning. In this paper, we theoretically analyze the feature learning ability of a two-layer linear neural network with multiple outputs in a high-dimensional setting. For that purpose, we propose a new criterion so that we can properly learn this two-layer neural network in a high-dimensional setting. Interestingly, we can show that the estimator obtained by minimizing the criterion can generalize even when the normal ridge regression can not. This is due to the feature learning ability of the neural network, and because the proposed criterion is constructed so that it behaves like an upper bound of the predictive risk. As an important characterization of the estimator, we show that this network can achieve the optimal Bayes risk that is determined by the distribution of the true signals across the multiple outputs. To our knowledge, this is the first work that clarifies when the normal ridge regression does not generalize, but the optimized two-layer linear network can generalize in multi-output linear regression settings.
参考文献:
[1] Chen, Y., Lin, Z. and Müller, H.-G. (2023). Wasserstein regression. Journal of the American Statistical Association, vol. 118(542), pp. 869-882.