ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

Aaron Defazio

arXiv:2605.19095·cs.LG·May 20, 2026

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

Aaron Defazio

PDF

TL;DR

ScheduleFree+ is a novel learning-rate-free and schedule-free training method that scales effectively to large language models, outperforming traditional schedules by 31% at 1000 tokens per parameter.

Contribution

The paper introduces ScheduleFree+, a scalable training approach for large language models that eliminates the need for learning rate schedules and outperforms existing methods.

Findings

01

ScheduleFree+ outperforms Warmup-Stable-Decay schedules.

02

It is most effective for long-duration training.

03

Achieves 31% improvement at 1000 tokens per parameter.

Abstract

Schedule-Free Learning has shown promise as a practical anytime training method for machine learning, showing success across dozens of standard benchmark problems. However, strong performance for LLM training has only been demonstrated at small scales. We identify a number of fixes necessary to scale up Schedule-Free Learning to larger batch sizes and model sizes, and present a learning-rate-free and schedule-free method (ScheduleFree+) for training large language models which greatly outperforms Warmup-Stable-Decay (WSD) schedules. We also demonstrate that Schedule-Free Learning is most effective for long duration training, and at 1000 tokens per parameter, it outperforms SOTA schedules by 31%. Schedule-Free Learning provides a theoretical foundation for the use of model averaging and checkpoint merging during pretraining.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.