Glassy dynamics near the interpolation transition in deep recurrent networks

John Hertz; Joanna Tyrcha

arXiv:2412.10094·cond-mat.dis-nn·May 22, 2025

Glassy dynamics near the interpolation transition in deep recurrent networks

John Hertz, Joanna Tyrcha

PDF

Open Access

TL;DR

This paper investigates the learning dynamics of deep recurrent networks near the interpolation transition, revealing critical slowing down and aging phenomena akin to spin glass models, with implications for understanding training behavior.

Contribution

It identifies the critical slowing down and aging phenomena near the interpolation transition in deep recurrent networks, linking these behaviors to spin glass models and advancing understanding of learning dynamics.

Findings

01

Learning times diverge as network width approaches critical value

02

Aging behavior characterized by weight fluctuation scaling

03

Critical phenomena similar to spin glass models observed

Abstract

We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from over-parametrized networks, known as the interpolation transition. The training data are Bach chorales in 4-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to $1/ (w - w_{c})$ as the width w approaches a (loss-dependent) critical value $w_{c}$ . We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications