Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate
Yingzhen Yang, Ping Li

TL;DR
This paper introduces a Gradient Descent with Projection (GDP) method for training over-parameterized neural networks, achieving nearly minimax optimal rates for learning low-degree spherical polynomials, surpassing traditional kernel methods.
Contribution
The paper presents a novel GDP algorithm that learns low-degree polynomials with nearly optimal sample complexity and risk bounds, extending beyond NTK limits.
Findings
Sample complexity of $\Theta( d^{k_0}/\eps )$ for learning polynomials.
Nearly minimax optimal regression risk rate achieved.
Adaptive degree selection algorithm for unknown polynomial degree.
Abstract
We study the problem of learning a low-degree spherical polynomial of degree defined on the unit sphere in by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk , an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of with probability for , in contrast with the representative sample complexity \Theta(d^{k_0} \max\set{\eps^{-2},\log d}). Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Mathematical Approximation and Integration
