Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping
Ilja Kuzborskij, Csaba Szepesv\'ari

TL;DR
This paper demonstrates that shallow overparameterized neural networks trained with gradient descent and early stopping can effectively learn Lipschitz regression functions, achieving optimal rates even with noisy labels, using a simple, finite-width analysis.
Contribution
It introduces a straightforward analysis for overparameterized shallow networks with early stopping, avoiding kernel approximations and handling label noise effectively.
Findings
Early stopping enables optimal learning rates with noisy labels.
Finite-width analysis is possible without kernelization.
Neural networks trained by GD are smooth with respect to inputs.
Abstract
We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noisy labels, neural networks trained to nearly zero training error are inconsistent on this class, we propose an early stopping rule that allows us to show optimal rates. This provides an alternative to the result of Hu et al. (2021) who studied the performance of -regularized GD for training shallow networks in nonparametric regression which fully relied on the infinite-width network (Neural Tangent Kernel (NTK)) approximation. Here we present a simpler analysis which is based on a partitioning argument of the input space (as in the case of 1-nearest-neighbor rule) coupled with the fact that trained neural networks are smooth with respect to their inputs when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference
MethodsEarly Stopping
