Increasing Both Batch Size and Learning Rate Accelerates Stochastic   Gradient Descent

Hikaru Umeda; Hideaki Iiduka

arXiv:2409.08770·cs.LG·February 17, 2025·2 cites

Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent

Hikaru Umeda, Hideaki Iiduka

PDF

Open Access 1 Repo

TL;DR

This paper provides theoretical analysis and numerical evidence that increasing both batch size and learning rate, especially with specific schedulers, accelerates stochastic gradient descent in training deep neural networks.

Contribution

It introduces and analyzes new schedulers combining increasing batch size with increasing or warm-up learning rates, showing they outperform traditional methods in minimizing gradient norms.

Findings

01

Schedulers with increasing batch size and learning rate accelerate convergence.

02

Increasing batch size and learning rate together reduces the full gradient norm faster.

03

Schedulers with warm-up or increasing learning rate outperform constant or decaying schedules.

Abstract

The performance of mini-batch stochastic gradient descent (SGD) strongly depends on setting the batch size and learning rate to minimize the empirical loss in training the deep neural network. In this paper, we present theoretical analyses of mini-batch SGD with four schedulers: (i) constant batch size and decaying learning rate scheduler, (ii) increasing batch size and decaying learning rate scheduler, (iii) increasing batch size and increasing learning rate scheduler, and (iv) increasing batch size and warm-up decaying learning rate scheduler. We show that mini-batch SGD using scheduler (i) does not always minimize the expectation of the full gradient norm of the empirical loss, whereas it does using any of schedulers (ii), (iii), and (iv). Furthermore, schedulers (iii) and (iv) accelerate mini-batch SGD. The paper also provides numerical results of supporting analyses showing that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iiduka-researches/incr_both_bs_lr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Neural Networks and Applications

MethodsStochastic Gradient Descent