The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size
Hideaki Iiduka

TL;DR
This paper proves that the number of steps for nonconvex optimization in deep learning optimizers can be modeled as a rational function of batch size, revealing an optimal batch size and optimizer-dependent effects.
Contribution
It provides a theoretical framework expressing the steps as rational functions of batch size, highlighting optimizer-specific optimal batch sizes and their impact.
Findings
Existence of an optimal batch size minimizing steps
Larger batch sizes beyond the optimum do not reduce steps
Momentum and Adam optimizers can utilize larger batch sizes effectively
Abstract
Recently, convergence as well as convergence rate analyses of deep learning optimizers for nonconvex optimization have been widely studied. Meanwhile, numerical evaluations for the optimizers have precisely clarified the relationship between batch size and the number of steps needed for training deep neural networks. The main contribution of this paper is to show theoretically that the number of steps needed for nonconvex optimization of each of the optimizers can be expressed as a rational function of batch size. Having these rational functions leads to two particularly important facts, which were validated numerically in previous studies. The first fact is that there exists an optimal batch size such that the number of steps needed for nonconvex optimization is minimized. This implies that using larger batch sizes than the optimal batch size does not decrease the number of steps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
