Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization
Xiaoyu Wang, Mikael Johansson

TL;DR
This paper introduces a flexible class of bandwidth-based step-sizes for stochastic gradient descent in non-convex deep learning, providing theoretical guarantees and practical schemes that improve training efficiency.
Contribution
It develops a unified theoretical framework for bandwidth-based step-sizes, including popular cyclic and non-monotonic schedules, with convergence guarantees and new effective schemes.
Findings
Proves convergence guarantees for various bandwidth-based step-sizes.
Shows the optimality of step-decay schedules within this framework.
Demonstrates the efficiency of proposed schemes on deep neural network training tasks.
Abstract
Many popular learning-rate schedules for deep neural networks combine a decaying trend with local perturbations that attempt to escape saddle points and bad local minima. We derive convergence guarantees for bandwidth-based step-sizes, a general class of learning rates that are allowed to vary in a banded region. This framework includes many popular cyclic and non-monotonic step-sizes for which no theoretical guarantees were previously known. We provide worst-case guarantees for SGD on smooth non-convex problems under several bandwidth-based step sizes, including stagewise and the popular step-decay (constant and then drop by a constant), which is also shown to be optimal. Moreover, we show that its momentum variant converges as fast as SGD with the bandwidth-based step-decay step-size. Finally, we propose novel step-size schemes in the bandwidth-based family and verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
