Stepping on the Edge: Curvature Aware Learning Rate Tuners
Vincent Roulet, Atish Agarwala, Jean-Bastien Grill, Grzegorz Swirszcz,, Mathieu Blondel, Fabian Pedregosa

TL;DR
This paper investigates the complex dynamics between learning rate tuning and curvature during training, introduces a new method called CDAT that emphasizes long-term curvature stabilization, and demonstrates its effectiveness in full batch and mini batch regimes.
Contribution
The paper analyzes the feedback loop between learning rate and curvature, and proposes CDAT, a novel tuning method that improves long-term training stability and performance.
Findings
Classical tuners may reduce loss faster initially but underperform long-term.
CDAT outperforms constant learning rates in full batch training.
Stochasticity affects the effectiveness of learning rate tuners at different batch sizes.
Abstract
Curvature information -- particularly, the largest eigenvalue of the loss Hessian, known as the sharpness -- often forms the basis for learning rate tuners. However, recent work has shown that the curvature information undergoes complex dynamics during training, going from a phase of increasing sharpness to eventual stabilization. We analyze the closed-loop feedback effect between learning rate tuning and curvature. We find that classical learning rate tuners may yield greater one-step loss reduction, yet they ultimately underperform in the long term when compared to constant learning rates in the full batch regime. These models break the stabilization of the sharpness, which we explain using a simplified model of the joint dynamics of the learning rate and the curvature. To further investigate these effects, we introduce a new learning rate tuning method, Curvature Dynamics Aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTeleoperation and Haptic Systems · Learning Styles and Cognitive Differences · Music Technology and Sound Studies
MethodsAttentive Walk-Aggregating Graph Neural Network
