The Number of Steps Needed for Nonconvex Optimization of a Deep Learning   Optimizer is a Rational Function of Batch Size

Hideaki Iiduka

arXiv:2108.11713·math.OC·August 27, 2021

The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

Hideaki Iiduka

PDF

Open Access

TL;DR

This paper proves that the number of steps for nonconvex optimization in deep learning optimizers can be modeled as a rational function of batch size, revealing an optimal batch size and optimizer-dependent effects.

Contribution

It provides a theoretical framework expressing the steps as rational functions of batch size, highlighting optimizer-specific optimal batch sizes and their impact.

Findings

01

Existence of an optimal batch size minimizing steps

02

Larger batch sizes beyond the optimum do not reduce steps

03

Momentum and Adam optimizers can utilize larger batch sizes effectively

Abstract

Recently, convergence as well as convergence rate analyses of deep learning optimizers for nonconvex optimization have been widely studied. Meanwhile, numerical evaluations for the optimizers have precisely clarified the relationship between batch size and the number of steps needed for training deep neural networks. The main contribution of this paper is to show theoretically that the number of steps needed for nonconvex optimization of each of the optimizers can be expressed as a rational function of batch size. Having these rational functions leads to two particularly important facts, which were validated numerically in previous studies. The first fact is that there exists an optimal batch size such that the number of steps needed for nonconvex optimization is minimized. This implies that using larger batch sizes than the optimal batch size does not decrease the number of steps…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms