Cumulative Learning Rate Adaptation: Revisiting Path-Based Schedules for SGD and Adam

Asma Atamna; Tom Maus; Fabian Kievelitz; Tobias Glasmachers

arXiv:2508.05408·cs.LG·August 8, 2025

Cumulative Learning Rate Adaptation: Revisiting Path-Based Schedules for SGD and Adam

Asma Atamna, Tom Maus, Fabian Kievelitz, Tobias Glasmachers

PDF

TL;DR

This paper revisits a path-based learning rate adaptation method for SGD and Adam, proposing corrections for Adam and benchmarking their performance to understand when adaptive strategies are beneficial.

Contribution

We identify and correct conceptual issues in a 2017 cumulative path-based adaptation scheme for Adam and evaluate its practical effectiveness through comprehensive benchmarking.

Findings

01

Adaptive methods can improve training efficiency in certain scenarios

02

Corrected adaptation scheme aligns better with Adam's dynamics

03

Benchmark results clarify when adaptive learning rates are advantageous

Abstract

The learning rate is a crucial hyperparameter in deep learning, with its ideal value depending on the problem and potentially changing during training. In this paper, we investigate the practical utility of adaptive learning rate mechanisms that adjust step sizes dynamically in response to the loss landscape. We revisit a cumulative path-based adaptation scheme proposed in 2017, which adjusts the learning rate based on the discrepancy between the observed path length, computed as a time-discounted sum of normalized gradient steps, and the expected length of a random walk. While the original approach offers a compelling intuition, we show that its adaptation mechanism for Adam is conceptually inconsistent due to the optimizer's internal preconditioning. We propose a corrected variant that better reflects Adam's update dynamics. To assess the practical value of online learning rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.