Cumulative Learning Rate Adaptation: Revisiting Path-Based Schedules for SGD and Adam
Asma Atamna, Tom Maus, Fabian Kievelitz, Tobias Glasmachers

TL;DR
This paper revisits a path-based learning rate adaptation method for SGD and Adam, proposing corrections for Adam and benchmarking their performance to understand when adaptive strategies are beneficial.
Contribution
We identify and correct conceptual issues in a 2017 cumulative path-based adaptation scheme for Adam and evaluate its practical effectiveness through comprehensive benchmarking.
Findings
Adaptive methods can improve training efficiency in certain scenarios
Corrected adaptation scheme aligns better with Adam's dynamics
Benchmark results clarify when adaptive learning rates are advantageous
Abstract
The learning rate is a crucial hyperparameter in deep learning, with its ideal value depending on the problem and potentially changing during training. In this paper, we investigate the practical utility of adaptive learning rate mechanisms that adjust step sizes dynamically in response to the loss landscape. We revisit a cumulative path-based adaptation scheme proposed in 2017, which adjusts the learning rate based on the discrepancy between the observed path length, computed as a time-discounted sum of normalized gradient steps, and the expected length of a random walk. While the original approach offers a compelling intuition, we show that its adaptation mechanism for Adam is conceptually inconsistent due to the optimizer's internal preconditioning. We propose a corrected variant that better reflects Adam's update dynamics. To assess the practical value of online learning rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
