Optimal Growth Schedules for Batch Size and Learning Rate in SGD that Reduce SFO Complexity

Hikaru Umeda; Hideaki Iiduka

arXiv:2508.05297·cs.LG·August 8, 2025

Optimal Growth Schedules for Batch Size and Learning Rate in SGD that Reduce SFO Complexity

Hikaru Umeda, Hideaki Iiduka

PDF

TL;DR

This paper derives optimal schedules for increasing batch size and learning rate in SGD to minimize stochastic first-order oracle complexity, improving training efficiency for large deep learning models.

Contribution

It provides the first theoretical derivation of optimal growth schedules for batch size and learning rate in SGD based on SFO complexity, with validated practical guidelines.

Findings

01

Optimal growth schedules reduce SFO complexity.

02

Schedules improve training efficiency for large-batch deep learning.

03

Validated through extensive experiments.

Abstract

The unprecedented growth of deep learning models has enabled remarkable advances but introduced substantial computational bottlenecks. A key factor contributing to training efficiency is batch-size and learning-rate scheduling in stochastic gradient methods. However, naive scheduling of these hyperparameters can degrade optimization efficiency and compromise generalization. Motivated by recent theoretical insights, we investigated how the batch size and learning rate should be increased during training to balance efficiency and convergence. We analyzed this problem on the basis of stochastic first-order oracle (SFO) complexity, defined as the expected number of gradient evaluations needed to reach an $ϵ$ -approximate stationary point of the empirical loss. We theoretically derived optimal growth schedules for the batch size and learning rate that reduce SFO complexity and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.