On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca, Yan Wu, Chongli Qin, Benoit Dherin

TL;DR
This paper introduces the principal flow, a continuous-time model that captures the instability and divergence behaviors of gradient descent in deep learning, providing new insights and a method for adaptive learning rate control.
Contribution
The paper proposes the principal flow as the first continuous model capturing gradient descent instability and introduces a learning rate adaptation method based on this understanding.
Findings
Principal flow models divergence and oscillations in gradient descent.
Edge of stability phenomena explained through Hessian eigendecomposition.
Adaptive learning rate method improves training stability and performance.
Abstract
The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced Neuroimaging Techniques and Applications · Medical Imaging Techniques and Applications
MethodsTest
