What do near-optimal learning rate schedules look like?

Hiroki Naganuma; Atish Agarwala; Priya Kasimbeg; George E. Dahl

arXiv:2603.10301·cs.LG·March 16, 2026

What do near-optimal learning rate schedules look like?

Hiroki Naganuma, Atish Agarwala, Priya Kasimbeg, George E. Dahl

PDF

Open Access

TL;DR

This paper introduces a search method to identify near-optimal learning rate schedules for neural network training, revealing that warmup and decay are key features and that common schedules are suboptimal across various tasks.

Contribution

The authors develop a schedule search procedure that isolates shape from base learning rate, providing the most comprehensive analysis of near-optimal schedules to date.

Findings

01

Warmup and decay are robust features of good schedules.

02

Common schedule families are suboptimal for tested workloads.

03

Weight decay significantly influences optimal schedule shape.

Abstract

A basic unanswered question in neural network training is: what is the best learning rate schedule shape for a given workload? The choice of learning rate schedule is a key factor in the success or failure of the training process, but beyond having some kind of warmup and decay, there is no consensus on what makes a good schedule shape. To answer this question, we designed a search procedure to find the best shapes within a parameterized schedule family. Our approach factors out the schedule shape from the base learning rate, which otherwise would dominate cross-schedule comparisons. We applied our search procedure to a variety of schedule families on three workloads: linear regression, image classification on CIFAR-10, and small-scale language modeling on Wikitext103. We showed that our search procedure indeed generally found near-optimal schedules. We found that warmup and decay are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning