AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Hongyuan Dong; Dingkang Yang; Xiao Liang; Chao Feng; Jiao Ran

arXiv:2506.13274·cs.LG·December 23, 2025

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Hongyuan Dong, Dingkang Yang, Xiao Liang, Chao Feng, Jiao Ran

PDF

TL;DR

AdaLRS is an adaptive learning rate search algorithm that optimizes pretraining efficiency for foundation models by leveraging loss descent velocities, demonstrating robustness and improved performance across various training scenarios.

Contribution

This work introduces AdaLRS, a novel online adaptive learning rate method that uses loss dynamics to guide hyperparameter tuning during foundation model pretraining.

Findings

01

AdaLRS effectively adjusts learning rates near optimal values.

02

It improves model performance across different training scenarios.

03

Theoretical guarantees ensure convergence of the method.

Abstract

Learning rate is widely regarded as crucial for effective foundation model pretraining. Recent research explores and demonstrates the transferability of learning rate configurations across varying model and dataset sizes, etc. Nevertheless, these approaches are constrained to specific training scenarios and typically necessitate extensive hyperparameter tuning on proxy models. In this work, we propose \textbf{AdaLRS}, a plug-in-and-play adaptive learning rate search algorithm that conducts online optimal learning rate search via optimizing loss descent velocities. We provide theoretical and experimental analyzes to show that foundation model pretraining loss and its descent velocity are both convex and share the same optimal learning rate. Relying solely on training loss dynamics, AdaLRS involves few extra computations to guide the search process, and its convergence is guaranteed via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.