Learning Lipschitz Functions by GD-trained Shallow Overparameterized   ReLU Neural Networks

Ilja Kuzborskij; Csaba Szepesv\'ari

arXiv:2212.13848·cs.LG·April 7, 2023·1 cites

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

Ilja Kuzborskij, Csaba Szepesv\'ari

PDF

Open Access

TL;DR

This paper demonstrates that early-stopped gradient descent training of overparameterized shallow ReLU neural networks can effectively learn Lipschitz functions with optimal rates, leveraging the Neural Tangent Kernel framework.

Contribution

It establishes the connection between early stopping rules and minimax optimal rates for learning Lipschitz functions with shallow ReLU networks, using NTK approximation.

Findings

01

Early stopping yields optimal learning rates for Lipschitz functions.

02

NTK approximation guides the design of effective stopping rules.

03

Neural networks can learn non-differentiable functions with high accuracy.

Abstract

We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural networks trained to nearly zero training error are inconsistent in this class, we focus on the early-stopped GD which allows us to show consistency and optimal rates. In particular, we explore this problem from the viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained finite-width neural network. We show that whenever some early stopping rule is guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the kernel induced by the ReLU activation function, the same rule can be used to achieve minimax optimal rate for learning on the class of considered Lipschitz functions by neural networks. We discuss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Model Reduction and Neural Networks

MethodsEarly Stopping