Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression

Yingzhen Yang; Ping Li

arXiv:2411.02904·stat.ML·November 7, 2025

Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression

Yingzhen Yang, Ping Li

PDF

Open Access

TL;DR

This paper demonstrates that over-parameterized neural networks trained with gradient descent and early stopping achieve optimal nonparametric regression rates similar to kernel methods, without restrictive distributional assumptions on covariates.

Contribution

It establishes that gradient descent training of over-parameterized neural networks attains minimax optimal rates for nonparametric regression under broad conditions, bridging neural networks and kernel methods.

Findings

01

Achieves $oxed{ ext{O}( ext{} ext{ extepsilon}_n^2)}$ regression risk rate.

02

No distributional assumptions on covariates needed, only boundedness.

03

Provides insights on stopping time, network width, and learning rate for training.

Abstract

We study nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) in this paper. We show that, if the neural network is trained by GD with early stopping, then the trained network renders a sharp rate of the nonparametric regression risk of $O (ϵ_{n}^{2})$ , which is the same rate as that for the classical kernel regression trained by GD with early stopping, where $ϵ_{n}$ is the critical population rate of the Neural Tangent Kernel (NTK) associated with the network and $n$ is the size of the training data. It is remarked that our result does not require distributional assumptions about the covariate as long as the covariate is bounded, in a strong contrast with many existing results which rely on specific distributions of the covariates such as the spherical uniform data distribution or distributions satisfying certain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsNeural Tangent Kernel · Early Stopping