Neural network learns low-dimensional polynomials with SGD near the   information-theoretic limit

Jason D. Lee; Kazusato Oko; Taiji Suzuki; Denny Wu

arXiv:2406.01581·cs.LG·December 24, 2024

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu

PDF

Open Access

TL;DR

This paper demonstrates that a simple two-layer neural network trained with SGD can learn low-dimensional polynomial functions near the information-theoretic limit, surpassing previous complexity bounds tied to the function's information exponent.

Contribution

It proves that neural networks can learn polynomial single-index models with near-optimal sample complexity, independent of the function's information exponent, using a novel analysis of minibatch reuse.

Findings

01

Neural networks achieve near-optimal sample complexity for polynomial models.

02

SGD-based training surpasses prior bounds related to the information exponent.

03

Higher-order information from minibatch reuse is key to the analysis.

Abstract

We study the problem of gradient descent learning of a single-index target function $f_{*} (x) = σ_{*} (⟨ x, θ ⟩)$ under isotropic Gaussian data in $R^{d}$ , where the unknown link function $σ_{*} : R \to R$ has information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n ≳ d^{Θ (p)}$ samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns $f_{*}$ with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications