Learning Rate Engineering: From Coarse Single Parameter to Layered Evolution

Ming-Hong Yao; Di Wang; Jian Cui; Jin-Yan Chen; Zi-Hao Cui; Fa Wang; Chen Wei; Qiu-Ye Yu

arXiv:2604.27295·cs.AI·May 1, 2026

Learning Rate Engineering: From Coarse Single Parameter to Layered Evolution

Ming-Hong Yao, Di Wang, Jian Cui, Jin-Yan Chen, Zi-Hao Cui, Fa Wang, Chen Wei, Qiu-Ye Yu

PDF

TL;DR

This paper reviews the evolution of learning rate scheduling from simple global rates to layered, adaptive strategies, proposing a unified framework called DALS and benchmarking multiple approaches across diverse datasets.

Contribution

It systematizes the evolution of learning rate strategies into five generations and introduces DALS, a unified adaptive optimizer integrating multiple scheduling techniques.

Findings

01

DALS achieves 98.0% accuracy on synthetic tasks.

02

DALS-Fast reaches 90% accuracy in 3 epochs.

03

No single strategy outperforms across all regimes.

Abstract

Learning rate scheduling has evolved from the single global fixed rate of early SGD to sophisticated layer-wise adaptive strategies. We systematize this evolution into five generations: (Gen1) global fixed learning rates, (Gen2) global scheduling, (Gen3) parameter-level adaptation, (Gen4) layer-level differentiation, and (Gen5) joint layer-time scheduling. We trace the fundamental motivation behind each transition, showing how the shift from one-size-fits-all to tailoring by layer and time addresses the impossible trinity of transfer learning: lower layers require small updates to preserve general knowledge while higher layers need large updates to adapt to new tasks. Building on this taxonomy, we propose Discriminative Adaptive Layer Scaling (DALS), a unified framework that integrates phase-adaptive cosine scheduling, depth-aware Grokfast gradient filtering, and LARS-style trust ratios…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.