Glassy dynamics near the interpolation transition in deep recurrent networks
John Hertz, Joanna Tyrcha

TL;DR
This paper investigates the learning dynamics of deep recurrent networks near the interpolation transition, revealing critical slowing down and aging phenomena akin to spin glass models, with implications for understanding training behavior.
Contribution
It identifies the critical slowing down and aging phenomena near the interpolation transition in deep recurrent networks, linking these behaviors to spin glass models and advancing understanding of learning dynamics.
Findings
Learning times diverge as network width approaches critical value
Aging behavior characterized by weight fluctuation scaling
Critical phenomena similar to spin glass models observed
Abstract
We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from over-parametrized networks, known as the interpolation transition. The training data are Bach chorales in 4-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to as the width w approaches a (loss-dependent) critical value . We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
