Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang, Hua Wang, Weijie J. Su

TL;DR
This paper models deep learning training dynamics using stochastic differential equations to reveal how local impact influences feature separability and neural collapse, highlighting a phase transition phenomenon in training behavior.
Contribution
It introduces a novel SDE-based framework capturing sample-specific training dynamics and uncovers a phase transition driven by local elasticity affecting feature separability.
Findings
Local elasticity determines whether features become linearly separable.
A phase transition exists based on the impact of backpropagation on same-class samples.
Neural collapse emerges under conditions of local elasticity.
Abstract
Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent? In this study, we model the evolution of features during deep learning training using a set of stochastic differential equations (SDEs) that each corresponds to a training sample. As a crucial ingredient in our modeling strategy, each SDE contains a drift term that reflects the impact of backpropagation at an input on the features of all samples. Our main finding uncovers a sharp phase transition phenomenon regarding the {intra-class impact: if the SDEs are locally elastic in the sense that the impact is more significant on samples from the same class as the input, the features of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Gaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis
