Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Liangchen Luo; Yuanhao Xiong; Yan Liu; Xu Sun

arXiv:1902.09843·cs.LG·April 22, 2019·189 cites

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Liangchen Luo, Yuanhao Xiong, Yan Liu, Xu Sun

PDF

Open Access 5 Repos

TL;DR

This paper introduces AdaBound and AMSBound, adaptive optimizers with dynamic learning rate bounds, which smoothly transition from adaptive methods to SGD, improving generalization and training stability.

Contribution

The paper proposes novel variants of Adam and AMSGrad with dynamic bounds on learning rates, providing convergence guarantees and improved performance over existing adaptive methods.

Findings

01

Eliminates the generalization gap between adaptive methods and SGD

02

Maintains higher learning speed early in training

03

Significantly improves performance on complex deep networks

Abstract

Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with SGD or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue but they failed to achieve considerable improvement over existing methods. In our paper, we demonstrate that extreme learning rates can lead to poor performance. We provide new variants of Adam and AMSGrad, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. We further conduct experiments on various popular tasks and models, which is often…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Metaheuristic Optimization Algorithms Research

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · AMSBound · AdaBound · AdaGrad · RMSProp · Adam · Stochastic Gradient Descent