High-Dimensional Analysis of Gradient Flow for Extensive-Width Quadratic Neural Networks
Simon Martin (DI-ENS, LPENS, SIERRA), Giulio Biroli (LPENS), Francis Bach (DI-ENS, SIERRA)

TL;DR
This paper analyzes the training dynamics of wide quadratic neural networks in high dimensions, revealing how overparameterization affects learning, generalization, and the double descent phenomenon using a dynamical mean-field theory approach.
Contribution
It provides a dynamical mean-field theory analysis of gradient flow in extensive-width quadratic neural networks, characterizing performance and spectral properties in high dimensions.
Findings
Reveals double descent phenomenon with label noise.
Provides exact recovery threshold as a function of network widths.
Shows overparameterization improves generalization beyond interpolation.
Abstract
We study the high-dimensional training dynamics of a shallow neural network with quadratic activation in a teacher-student setup. We focus on the extensive-width regime, where the teacher and student network widths scale proportionally with the input dimension, and the sample size grows quadratically. This scaling aims to describe overparameterized neural networks in which feature learning still plays a central role. In the high-dimensional limit, we derive a dynamical characterization of the gradient flow, in the spirit of dynamical mean-field theory (DMFT). Under l2-regularization, we analyze these equations at long times and characterize the performance and spectral properties of the resulting estimator. This result provides a quantitative understanding of the effect of overparameterization on learning and generalization, and reveals a double descent phenomenon in the presence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM
