REX: Revisiting Budgeted Training with an Improved Schedule
John Chen, Cameron Wolfe, Anastasios Kyrillidis

TL;DR
This paper introduces the REX schedule, a novel learning rate schedule that adapts to different training budgets, outperforming traditional schedules in low budget scenarios and matching or exceeding them in high budget cases without extra costs.
Contribution
The paper proposes the REX schedule, a new adaptive learning rate profile and sampling rate combination that improves budget-aware training performance.
Findings
REX outperforms linear schedule in low budget regimes.
REX matches or exceeds state-of-the-art schedules in various settings.
REX requires no additional computation, storage, or hyperparameters.
Abstract
Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the \texttt{30-60-90} step schedule -- are known to achieve high performance when the model can be trained for many epochs. Yet, it is often not known a priori whether one's budget will be large or small; thus, the optimal choice of learning rate schedule is made on a case-by-case basis. In this paper, we frame the learning rate schedule selection problem as a combination of selecting a profile (i.e., the continuous function that models the learning rate schedule), and choosing a sampling rate (i.e., how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗PJMixers-Images/Florence-2-base-Castollux-v0.5model· 505 dl· ♡ 5505 dl♡ 5
- 🤗PJMixers-Images/Florence-2-base-Castollux-v0.1model· 2 dl2 dl
- 🤗PJMixers-Images/Florence-2-base-Castollux-v0.2model· 7 dl· ♡ 27 dl♡ 2
- 🤗PJMixers-Images/Florence-2-base-Castollux-v0.4model· 6 dl· ♡ 16 dl♡ 1
- 🤗PJMixers-Dev/Gemma-3-Earthen-Completion-v0.1-4B-QLoRAmodel· 1 dl1 dl
- 🤗PJMixers-Dev/Gemma-3-Earthen-Completion-v0.1-4Bmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.1-4B-QLoRAmodel· 1 dl1 dl
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.1-4Bmodel· 1 dl1 dl
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.2-4B-QLoRAmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗PJMixers-Dev/Gemma-3-Earthen-v0.2-4Bmodel· 6 dl· ♡ 16 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Scheduling and Timetabling Solutions · Simulation Techniques and Applications
MethodsStep Decay · Stochastic Gradient Descent · Adam
