Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search
Yuki Tsukada, Hideaki Iiduka

TL;DR
This paper analyzes how the batch size affects the number of steps needed for nonconvex optimization in SGD with Armijo line search, revealing a convex relationship and identifying optimal batch sizes for efficiency.
Contribution
It provides a convergence analysis showing the monotone decreasing steps with increasing batch size and identifies a critical batch size minimizing SFO complexity.
Findings
Number of steps decreases as batch size increases.
Existence of a critical batch size minimizing SFO complexity.
Numerical results support theoretical predictions.
Abstract
While stochastic gradient descent (SGD) can use various learning rates, such as constant or diminishing rates, the previous numerical results showed that SGD performs better than other deep learning optimizers using when it uses learning rates given by line search methods. In this paper, we perform a convergence analysis on SGD with a learning rate given by an Armijo line search for nonconvex optimization indicating that the upper bound of the expectation of the squared norm of the full gradient becomes small when the number of steps and the batch size are large. Next, we show that, for SGD with the Armijo-line-search learning rate, the number of steps needed for nonconvex optimization is a monotone decreasing convex function of the batch size; that is, the number of steps needed for nonconvex optimization decreases as the batch size increases. Furthermore, we show that the stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Industrial Vision Systems and Defect Detection · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
