Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Blake Bordelon; Francesco Mori

arXiv:2602.04774·cond-mat.dis-nn·May 11, 2026

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Blake Bordelon, Francesco Mori

PDF

TL;DR

This paper develops a theoretical framework for optimal learning rate schedules in a random feature model, revealing distinct easy and hard phases and proposing schedules that outperform benchmarks.

Contribution

It introduces analytically derived optimal LR schedules for a solvable model, including regimes, joint optimization with batch size, and extensions to momentum parameters.

Findings

01

Optimal schedules differ in easy and hard phases.

02

Joint optimization of LR and batch size improves training efficiency.

03

Schedules outperform constant and power-law benchmarks.

Abstract

Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $η_{T}^{⋆} (t)$ where $t$ is the current iterate and $T$ is the training horizon. This schedule is computed both as a numerical optimization problem and also analytically using optimal control theory. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $η_{T}^{⋆} (t) ≃ T^{- ξ} (1 - t / T)^{δ}$ where $ξ$ and $δ$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.