Sharp Generalization for Nonparametric Regression in Interpolation Space by Over-Parameterized Neural Networks Trained with Preconditioned Gradient Descent and Early Stopping
Yingzhen Yang, Ping Li

TL;DR
This paper demonstrates that over-parameterized neural networks trained with preconditioned gradient descent and early stopping can achieve sharp nonparametric regression rates, surpassing standard kernel regression and NTK regime results.
Contribution
The authors introduce a novel analysis framework for neural network training, showing improved generalization rates via a new kernel decomposition and local Rademacher complexity control.
Findings
Achieves regression rate of O(n^{-2αs'/(2αs'+1)}) for target functions in interpolation space.
Surpasses nearly-optimal and standard NTK regression rates.
Provides theoretical evidence that PGD enables neural networks to escape the NTK regime.
Abstract
We study nonparametric regression using an over-parameterized two-layer neural networks trained with algorithmic guarantees in this paper. We consider the setting where the training features are drawn uniformly from the unit sphere in , and the target function lies in an interpolation space commonly studied in statistical learning theory. We demonstrate that training the neural network with a novel Preconditioned Gradient Descent (PGD) algorithm, equipped with early stopping, achieves a sharp regression rate of when the target function is in the interpolation space with . This rate is even sharper than the currently known nearly-optimal rate of ~\citep{Li2024-edr-general-domain}, where is the size of the training data and is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition
MethodsNeural Tangent Kernel · Early Stopping
