Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis

Yuichi Kondo; Hideaki Iiduka

arXiv:2508.03105·cs.LG·October 13, 2025

Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis

Yuichi Kondo, Hideaki Iiduka

PDF

TL;DR

This paper provides a theoretical analysis of SGDM with dynamic schedules, demonstrating how different batch size and learning rate strategies affect convergence, and validates these findings with empirical results showing improved convergence speed.

Contribution

The paper introduces a Lyapunov-based framework to analyze SGDM convergence under practical scheduling strategies, extending existing theory and providing new insights into convergence hierarchies.

Findings

01

Increasing batch size guarantees convergence of expected gradient norm.

02

Increasing both batch size and learning rate leads to faster decay.

03

Dynamic scheduling outperforms fixed hyperparameters in convergence speed.

Abstract

We analyze the convergence behavior of stochastic gradient descent with momentum (SGDM) under dynamic learning-rate and batch-size schedules by introducing a novel and simpler Lyapunov function. We extend the existing theoretical framework to cover three practical scheduling strategies commonly used in deep learning: a constant batch size with a decaying learning rate, an increasing batch size with a decaying learning rate, and an increasing batch size with an increasing learning rate. Our results reveal a clear hierarchy in convergence: a constant batch size does not guarantee convergence of the expected gradient norm, whereas an increasing batch size does, and simultaneously increasing both the batch size and learning rate achieves a provably faster decay. Empirical results validate our theory, showing that dynamically scheduled SGDM significantly outperforms its fixed-hyperparameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.