Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
G\'erard Ben Arous, Murat A. Erdogdu, Nuri Mert Vural, Denny Wu

TL;DR
This paper analyzes the training dynamics and sample complexity of quadratic neural networks in high dimensions, revealing how their performance scales with data, model width, and training time.
Contribution
It provides a rigorous analysis of SGD dynamics for quadratic neural networks, deriving scaling laws and convergence guarantees in high-dimensional settings.
Findings
SGD dynamics characterized in high dimensions
Scaling laws for prediction risk derived
Convergence guarantees established for effective dynamics
Abstract
We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as , is the 2nd Hermite polynomial, and are orthonormal signal directions. We consider the extensive-width regime for , and assume a power-law decay on the (non-negative) second-layer coefficients for . We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
