Generalization Ability of Wide Neural Networks on $\mathbb{R}$
Jianfa Lai, Manyun Xu, Rui Chen, Qian Lin

TL;DR
This paper investigates the generalization properties of wide two-layer ReLU neural networks on the real line, analyzing spectral properties of the neural tangent kernel and the effects of training strategies on generalization.
Contribution
It establishes spectral properties of the NTK on , analyzes the convergence of neural network kernels, and links training strategies to minimax rates and overfitting behavior.
Findings
NTK spectral properties are positive definite with eigenvalues proportional to i^{-2}
Neural network kernels converge to NTK as width increases
Early stopping achieves minimax regression rate, overfitting does not generalize well
Abstract
We perform a study on the generalization ability of the wide two-layer ReLU neural network on . We first establish some spectral properties of the neural tangent kernel (NTK): , the NTK defined on , is positive definite; , the -th largest eigenvalue of , is proportional to . We then show that: when the width , the neural network kernel (NNK) uniformly converges to the NTK; the minimax rate of regression over the RKHS associated to is ; if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; if one trains the neural network till it overfits the data, the resulting neural network can not generalize well. Finally, we provide an explanation to reconcile our theory and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
MethodsEarly Stopping · Neural Tangent Kernel
