Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws

G\'erard Ben Arous; Murat A. Erdogdu; Nuri Mert Vural; Denny Wu

arXiv:2508.03688·stat.ML·January 1, 2026

Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws

G\'erard Ben Arous, Murat A. Erdogdu, Nuri Mert Vural, Denny Wu

PDF

TL;DR

This paper analyzes the training dynamics and sample complexity of quadratic neural networks in high dimensions, revealing how their performance scales with data, model width, and training time.

Contribution

It provides a rigorous analysis of SGD dynamics for quadratic neural networks, deriving scaling laws and convergence guarantees in high-dimensional settings.

Findings

01

SGD dynamics characterized in high dimensions

02

Scaling laws for prediction risk derived

03

Convergence guarantees established for effective dynamics

Abstract

We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $f_{*} (x) \propto \sum_{j = 1}^{r} λ_{j} σ (⟨ θ_{j}, x ⟩), x \sim N (0, I_{d})$ , $σ$ is the 2nd Hermite polynomial, and ${θ_{j}}_{j = 1}^{r} \subset R^{d}$ are orthonormal signal directions. We consider the extensive-width regime $r ≍ d^{β}$ for $β \in [0, 1)$ , and assume a power-law decay on the (non-negative) second-layer coefficients $λ_{j} ≍ j^{- α}$ for $α \geq 0$ . We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.