Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Xiaoyu Wang; Mikael Johansson

arXiv:2106.02888·cs.LG·October 13, 2021·1 cites

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Xiaoyu Wang, Mikael Johansson

PDF

Open Access

TL;DR

This paper introduces a flexible class of bandwidth-based step-sizes for stochastic gradient descent in non-convex deep learning, providing theoretical guarantees and practical schemes that improve training efficiency.

Contribution

It develops a unified theoretical framework for bandwidth-based step-sizes, including popular cyclic and non-monotonic schedules, with convergence guarantees and new effective schemes.

Findings

01

Proves convergence guarantees for various bandwidth-based step-sizes.

02

Shows the optimality of step-decay schedules within this framework.

03

Demonstrates the efficiency of proposed schemes on deep neural network training tasks.

Abstract

Many popular learning-rate schedules for deep neural networks combine a decaying trend with local perturbations that attempt to escape saddle points and bad local minima. We derive convergence guarantees for bandwidth-based step-sizes, a general class of learning rates that are allowed to vary in a banded region. This framework includes many popular cyclic and non-monotonic step-sizes for which no theoretical guarantees were previously known. We provide worst-case guarantees for SGD on smooth non-convex problems under several bandwidth-based step sizes, including stagewise $1/ t$ and the popular step-decay (constant and then drop by a constant), which is also shown to be optimal. Moreover, we show that its momentum variant converges as fast as SGD with the bandwidth-based step-decay step-size. Finally, we propose novel step-size schemes in the bandwidth-based family and verify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications

MethodsStochastic Gradient Descent