On the Convergence of Stochastic Gradient Descent with Bandwidth-based Step Size
Xiaoyu Wang, Ya-xiang Yuan

TL;DR
This paper analyzes the convergence of stochastic gradient descent when the step size varies within a band, providing theoretical guarantees and practical insights for flexible step size strategies.
Contribution
It introduces a bandwidth-based step size framework for SGD, proving optimal convergence rates and analyzing existing strategies within this framework.
Findings
Optimal convergence rate under mild conditions.
Theoretical error bounds for various step size strategies.
Numerical experiments show efficiency of bandwidth-based step sizes.
Abstract
We investigate the stochastic gradient descent (SGD) method where the step size lies within a banded region instead of being given by a fixed formula. The optimal convergence rate under mild conditions and large initial step size is proved. Our analysis provides comparable theoretical error bounds for SGD associated with a variety of step sizes. In addition, the convergence rates for some existing step size strategies, e.g., triangular policy and cosine-wave, can be revealed by our analytical framework under the boundary constraints. The bandwidth-based step size provides efficient and flexible step size selection in optimization. We also propose a up-down policy and give several non-monotonic step sizes. Numerical experiments demonstrate the efficiency and significant potential of the bandwidth-based step-size in many applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
