The Road Less Scheduled

Aaron Defazio; Xingyu Alice Yang; Harsh Mehta; Konstantin Mishchenko,; Ahmed Khaled; Ashok Cutkosky

arXiv:2405.15682·cs.LG·October 31, 2024·1 cites

The Road Less Scheduled

Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko,, Ahmed Khaled, Ashok Cutkosky

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a schedule-free learning rate approach that achieves state-of-the-art performance across various problems without requiring a predefined stopping time or additional hyperparameters.

Contribution

It presents a novel schedule-free method based on a new theory unifying scheduling and iterate averaging, eliminating the need for tuning schedules.

Findings

01

Outperforms schedule-dependent methods across diverse problems

02

No additional hyperparameters needed over standard optimizers

03

Achieved top results in MLCommons 2024 challenge

Abstract

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available at https://github.com/facebookresearch/schedule_free. Schedule-Free AdamW is the core algorithm behind our winning entry to the MLCommons 2024 AlgoPerf Algorithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

The Road Less Scheduled· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Advanced Neural Network Applications

MethodsAdamW