Minimization of Stochastic First-order Oracle Complexity of Adaptive   Methods for Nonconvex Optimization

Hideaki Iiduka

arXiv:2112.07163·cs.LG·December 17, 2021·1 cites

Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization

Hideaki Iiduka

PDF

Open Access

TL;DR

This paper identifies the true critical batch size in deep learning optimization by analyzing the stochastic first-order oracle complexity, providing theoretical bounds and numerical validation to understand diminishing returns in batch size scaling.

Contribution

It introduces a theoretical framework to determine the actual critical batch size by analyzing the bounds of SFO complexity, advancing understanding of optimizer efficiency in deep learning.

Findings

01

Existence of critical batch sizes proven through bounds analysis

02

Theoretical conditions for SFO complexity bounds established

03

Numerical results support the theoretical critical batch size concept

Abstract

Numerical evaluations have definitively shown that, for deep learning optimizers such as stochastic gradient descent, momentum, and adaptive methods, the number of steps needed to train a deep neural network halves for each doubling of the batch size and that there is a region of diminishing returns beyond the critical batch size. In this paper, we determine the actual critical batch size by using the global minimizer of the stochastic first-order oracle (SFO) complexity of the optimizer. To prove the existence of the actual critical batch size, we set the lower and upper bounds of the SFO complexity and prove that there exist critical batch sizes in the sense of minimizing the lower and upper bounds. This proof implies that, if the SFO complexity fits the lower and upper bounds, then the existence of these critical batch sizes demonstrates the existence of the actual critical batch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM