Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo, Yuanhao Xiong, Yan Liu, Xu Sun

TL;DR
This paper introduces AdaBound and AMSBound, adaptive optimizers with dynamic learning rate bounds, which smoothly transition from adaptive methods to SGD, improving generalization and training stability.
Contribution
The paper proposes novel variants of Adam and AMSGrad with dynamic bounds on learning rates, providing convergence guarantees and improved performance over existing adaptive methods.
Findings
Eliminates the generalization gap between adaptive methods and SGD
Maintains higher learning speed early in training
Significantly improves performance on complex deep networks
Abstract
Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with SGD or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue but they failed to achieve considerable improvement over existing methods. In our paper, we demonstrate that extreme learning rates can lead to poor performance. We provide new variants of Adam and AMSGrad, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. We further conduct experiments on various popular tasks and models, which is often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Metaheuristic Optimization Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · AMSBound · AdaBound · AdaGrad · RMSProp · Adam · Stochastic Gradient Descent
