Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization
Hideaki Iiduka

TL;DR
This paper identifies the true critical batch size in deep learning optimization by analyzing the stochastic first-order oracle complexity, providing theoretical bounds and numerical validation to understand diminishing returns in batch size scaling.
Contribution
It introduces a theoretical framework to determine the actual critical batch size by analyzing the bounds of SFO complexity, advancing understanding of optimizer efficiency in deep learning.
Findings
Existence of critical batch sizes proven through bounds analysis
Theoretical conditions for SFO complexity bounds established
Numerical results support the theoretical critical batch size concept
Abstract
Numerical evaluations have definitively shown that, for deep learning optimizers such as stochastic gradient descent, momentum, and adaptive methods, the number of steps needed to train a deep neural network halves for each doubling of the batch size and that there is a region of diminishing returns beyond the critical batch size. In this paper, we determine the actual critical batch size by using the global minimizer of the stochastic first-order oracle (SFO) complexity of the optimizer. To prove the existence of the actual critical batch size, we set the lower and upper bounds of the SFO complexity and prove that there exist critical batch sizes in the sense of minimizing the lower and upper bounds. This proof implies that, if the SFO complexity fits the lower and upper bounds, then the existence of these critical batch sizes demonstrates the existence of the actual critical batch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
