Barzilai-Borwein Step Size for Stochastic Gradient Descent
Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian

TL;DR
This paper introduces a novel approach for automatically selecting step sizes in stochastic gradient descent methods using the Barzilai-Borwein technique, improving convergence and reducing manual tuning.
Contribution
It proposes the BB-based step size methods for SGD and SVRG, providing theoretical convergence guarantees and demonstrating practical effectiveness through experiments.
Findings
SVRG-BB converges linearly for strongly convex functions
The methods outperform or match well-tuned SGD variants
Automatic step size selection simplifies practical implementation
Abstract
One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization algorithms, the common practice in SGD is either to use a diminishing step size, or to tune a fixed step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result is missing in the literature. Numerical experiments on standard data sets show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
