Generalization Ability of Wide Neural Networks on $\mathbb{R}$

Jianfa Lai; Manyun Xu; Rui Chen; Qian Lin

arXiv:2302.05933·stat.ML·February 14, 2023

Generalization Ability of Wide Neural Networks on $\mathbb{R}$

Jianfa Lai, Manyun Xu, Rui Chen, Qian Lin

PDF

Open Access

TL;DR

This paper investigates the generalization properties of wide two-layer ReLU neural networks on the real line, analyzing spectral properties of the neural tangent kernel and the effects of training strategies on generalization.

Contribution

It establishes spectral properties of the NTK on , analyzes the convergence of neural network kernels, and links training strategies to minimax rates and overfitting behavior.

Findings

01

NTK spectral properties are positive definite with eigenvalues proportional to i^{-2}

02

Neural network kernels converge to NTK as width increases

03

Early stopping achieves minimax regression rate, overfitting does not generalize well

Abstract

We perform a study on the generalization ability of the wide two-layer ReLU neural network on $R$ . We first establish some spectral properties of the neural tangent kernel (NTK): $a)$ $K_{d}$ , the NTK defined on $R^{d}$ , is positive definite; $b)$ $λ_{i} (K_{1})$ , the $i$ -th largest eigenvalue of $K_{1}$ , is proportional to $i^{- 2}$ . We then show that: $i)$ when the width $m \to \infty$ , the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_{1}$ is $n^{- 2/3}$ ; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $i v)$ if one trains the neural network till it overfits the data, the resulting neural network can not generalize well. Finally, we provide an explanation to reconcile our theory and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques

MethodsEarly Stopping · Neural Tangent Kernel