Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks
Ilja Kuzborskij, Csaba Szepesv\'ari

TL;DR
This paper demonstrates that early-stopped gradient descent training of overparameterized shallow ReLU neural networks can effectively learn Lipschitz functions with optimal rates, leveraging the Neural Tangent Kernel framework.
Contribution
It establishes the connection between early stopping rules and minimax optimal rates for learning Lipschitz functions with shallow ReLU networks, using NTK approximation.
Findings
Early stopping yields optimal learning rates for Lipschitz functions.
NTK approximation guides the design of effective stopping rules.
Neural networks can learn non-differentiable functions with high accuracy.
Abstract
We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural networks trained to nearly zero training error are inconsistent in this class, we focus on the early-stopped GD which allows us to show consistency and optimal rates. In particular, we explore this problem from the viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained finite-width neural network. We show that whenever some early stopping rule is guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the kernel induced by the ReLU activation function, the same rule can be used to achieve minimax optimal rate for learning on the class of considered Lipschitz functions by neural networks. We discuss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Model Reduction and Neural Networks
MethodsEarly Stopping
