Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

Yingzhen Yang; Ping Li

arXiv:2603.21062·stat.ML·March 24, 2026

Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

Yingzhen Yang, Ping Li

PDF

Open Access

TL;DR

This paper introduces a Gradient Descent with Projection (GDP) method for training over-parameterized neural networks, achieving nearly minimax optimal rates for learning low-degree spherical polynomials, surpassing traditional kernel methods.

Contribution

The paper presents a novel GDP algorithm that learns low-degree polynomials with nearly optimal sample complexity and risk bounds, extending beyond NTK limits.

Findings

01

Sample complexity of $\Theta( d^{k_0}/\eps )$ for learning polynomials.

02

Nearly minimax optimal regression risk rate achieved.

03

Adaptive degree selection algorithm for unknown polynomial degree.

Abstract

We study the problem of learning a low-degree spherical polynomial of degree $k_{0} = Θ (1) \geq 1$ defined on the unit sphere in $\RR^{d}$ by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0, Θ (d^{- k_{0}})]$ , an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of $n ≍ Θ (lo g (4/ δ) \cdot d^{k_{0}} / \eps)$ with probability $1 - δ$ for $δ \in (0, 1)$ , in contrast with the representative sample complexity $\Theta(d^{k_0} \max\set{\eps^{-2},\log d})$ . Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Mathematical Approximation and Integration