On the Convergence of Stochastic Gradient Descent with Bandwidth-based   Step Size

Xiaoyu Wang; Ya-xiang Yuan

arXiv:2102.09031·math.OC·April 10, 2023·J. Mach. Learn. Res.·5 cites

On the Convergence of Stochastic Gradient Descent with Bandwidth-based Step Size

Xiaoyu Wang, Ya-xiang Yuan

PDF

Open Access

TL;DR

This paper analyzes the convergence of stochastic gradient descent when the step size varies within a band, providing theoretical guarantees and practical insights for flexible step size strategies.

Contribution

It introduces a bandwidth-based step size framework for SGD, proving optimal convergence rates and analyzing existing strategies within this framework.

Findings

01

Optimal convergence rate under mild conditions.

02

Theoretical error bounds for various step size strategies.

03

Numerical experiments show efficiency of bandwidth-based step sizes.

Abstract

We investigate the stochastic gradient descent (SGD) method where the step size lies within a banded region instead of being given by a fixed formula. The optimal convergence rate under mild conditions and large initial step size is proved. Our analysis provides comparable theoretical error bounds for SGD associated with a variety of step sizes. In addition, the convergence rates for some existing step size strategies, e.g., triangular policy and cosine-wave, can be revealed by our analytical framework under the boundary constraints. The bandwidth-based step size provides efficient and flexible step size selection in optimization. We also propose a $1/ t$ up-down policy and give several non-monotonic step sizes. Numerical experiments demonstrate the efficiency and significant potential of the bandwidth-based step-size in many applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data