Stepsize anything: A unified learning rate schedule for budgeted-iteration training
Anda Tang, Yiming Dong, Yutao Zeng, zhou Xun, Zhouchen Lin

TL;DR
This paper introduces the UBA schedule, a theoretically grounded learning rate schedule designed for budgeted-iteration training, which outperforms existing schedules across various architectures and tasks by explicitly accounting for training budget constraints.
Contribution
The paper proposes the UBA schedule, a novel, theoretically justified learning rate schedule for budgeted training, with a single hyper-parameter and proven convergence, applicable across diverse tasks and architectures.
Findings
UBA outperforms common schedules in vision and language tasks.
Theoretical connection between hyper-parameter and condition number.
Proven convergence for various hyper-parameter values.
Abstract
The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · Convolution · Kaiming Initialization · Global Average Pooling · Max Pooling
