Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Anda Tang; Yiming Dong; Yutao Zeng; zhou Xun; Zhouchen Lin

arXiv:2505.24452·cs.LG·December 9, 2025

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Anda Tang, Yiming Dong, Yutao Zeng, zhou Xun, Zhouchen Lin

PDF

TL;DR

This paper introduces the UBA schedule, a theoretically grounded learning rate schedule designed for budgeted-iteration training, which outperforms existing schedules across various architectures and tasks by explicitly accounting for training budget constraints.

Contribution

The paper proposes the UBA schedule, a novel, theoretically justified learning rate schedule for budgeted training, with a single hyper-parameter and proven convergence, applicable across diverse tasks and architectures.

Findings

01

UBA outperforms common schedules in vision and language tasks.

02

Theoretical connection between hyper-parameter and condition number.

03

Proven convergence for various hyper-parameter values.

Abstract

The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · Convolution · Kaiming Initialization · Global Average Pooling · Max Pooling