Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu

TL;DR
This paper demonstrates that a simple two-layer neural network trained with SGD can learn low-dimensional polynomial functions near the information-theoretic limit, surpassing previous complexity bounds tied to the function's information exponent.
Contribution
It proves that neural networks can learn polynomial single-index models with near-optimal sample complexity, independent of the function's information exponent, using a novel analysis of minibatch reuse.
Findings
Neural networks achieve near-optimal sample complexity for polynomial models.
SGD-based training surpasses prior bounds related to the information exponent.
Higher-order information from minibatch reuse is key to the analysis.
Abstract
We study the problem of gradient descent learning of a single-index target function under isotropic Gaussian data in , where the unknown link function has information exponent (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
