Adynamical systems view of training generativemodels and the memorization phenomenon
Siva Athreya, Chiranjib Bhattacharya, Vivek S. Borkar

TL;DR
This paper offers a system theoretic explanation for the memorization phenomenon in generative models, emphasizing the dynamic aspects of training and two time scale behaviors in stochastic gradient descent.
Contribution
It introduces a dynamical systems perspective to explain memorization in generative models, linking it to time scale separation and loss function dependence.
Findings
Explains memorization as a consequence of two time scales in SGD.
Connects collapse phenomena and double descent to dynamical system properties.
Provides a unified view of phenomena in generative model training.
Abstract
Using recent works of one of the authors (VSB) on collapse in generative models and two time scale dynamics in stochastic gradient descent in high dimensions, we give a system theoretic explanation of the memorization phenomenon in generative models. This relies purely on the dynamic aspects of the training phase. Specifically, we use a result of Austin [2016] to motivate a stylized model for the loss function for stochastic gradient descent (SGD) wherein the loss function has a strong dependence on some variables and weak dependence on the rest in a precise sense. This naturally leads to two distinct time scales in the constant step size SGD that is commonly used in machine learning. This fact has been used to explain the double descent phenomenon in SGD in Borkar [2026]. In conjunction with a mathematical model for collapse phenomenon in SGD developed in Borkar [2025a], we analyze the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
