Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks
Dario Bocchi, Theotime Regimbeau, Carlo Lucibello, Luca Saglietti, Chiara Cammarota

TL;DR
This paper studies the dynamics of one-pass SGD in overparameterized quadratic neural networks, revealing how overparameterization and weight norms influence escape from poor generalization and solution selection.
Contribution
It provides a detailed ODE-based analysis of SGD dynamics, uncovering the effects of overparameterization, weight norm constraints, and the geometry of solution manifolds.
Findings
Overparameterization modestly speeds up escape from poor generalization plateau.
Weight norm constraints create a manifold of zero-loss solutions.
Dynamics select solutions closest to initialization within the manifold.
Abstract
We analyze the one-pass stochastic gradient descent dynamics of a two-layer neural network with quadratic activations in a teacher--student framework. In the high-dimensional regime, where the input dimension and the number of samples diverge at fixed ratio , and for finite hidden widths of the student and teacher, respectively, we study the low-dimensional ordinary differential equations that govern the evolution of the student--teacher and student--student overlap matrices. We show that overparameterization () only modestly accelerates escape from a plateau of poor generalization by modifying the prefactor of the exponential decay of the loss. We then examine how unconstrained weight norms introduce a continuous rotational symmetry that results in a nontrivial manifold of zero-loss solutions for . From this manifold the dynamics consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
