Comparing Dynamics: Deep Neural Networks versus Glassy Systems
M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C., Cammarota, Y. LeCun, M. Wyart, G. Biroli

TL;DR
This paper investigates the training dynamics of deep neural networks using statistical physics methods, revealing similarities and differences with glassy systems and identifying phase transitions based on network parametrization.
Contribution
It introduces a physics-inspired analysis of DNN training dynamics, highlighting the impact of landscape flatness and parametrization on behavior.
Findings
Training slows down due to flat directions in the landscape.
Over-parametrized networks show glassy-like dynamics.
Distinct behaviors are observed between under- and over-parametrized regimes.
Abstract
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Visualization and Analytics · Complex Systems and Time Series Analysis
