Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression
Yingzhen Yang, Ping Li

TL;DR
This paper demonstrates that over-parameterized neural networks trained with gradient descent and early stopping achieve optimal nonparametric regression rates similar to kernel methods, without restrictive distributional assumptions on covariates.
Contribution
It establishes that gradient descent training of over-parameterized neural networks attains minimax optimal rates for nonparametric regression under broad conditions, bridging neural networks and kernel methods.
Findings
Achieves $oxed{ ext{O}( ext{} ext{ extepsilon}_n^2)}$ regression risk rate.
No distributional assumptions on covariates needed, only boundedness.
Provides insights on stopping time, network width, and learning rate for training.
Abstract
We study nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) in this paper. We show that, if the neural network is trained by GD with early stopping, then the trained network renders a sharp rate of the nonparametric regression risk of , which is the same rate as that for the classical kernel regression trained by GD with early stopping, where is the critical population rate of the Neural Tangent Kernel (NTK) associated with the network and is the size of the training data. It is remarked that our result does not require distributional assumptions about the covariate as long as the covariate is bounded, in a strong contrast with many existing results which rely on specific distributions of the covariates such as the spherical uniform data distribution or distributions satisfying certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsNeural Tangent Kernel · Early Stopping
