An Adaptive and Momental Bound Method for Stochastic Learning

Jianbang Ding; Xuancheng Ren; Ruixuan Luo; Xu Sun

arXiv:1910.12249·cs.LG·October 29, 2019·28 cites

An Adaptive and Momental Bound Method for Stochastic Learning

Jianbang Ding, Xuancheng Ren, Ruixuan Luo, Xu Sun

PDF

Open Access 2 Repos

TL;DR

This paper introduces AdaMod, an adaptive learning rate method that stabilizes training by bounding learning rates based on exponential moving averages, improving convergence especially on complex neural networks.

Contribution

The paper proposes AdaMod, a novel adaptive learning rate method that prevents excessively large updates, enhancing stability and performance in deep neural network training.

Findings

01

AdaMod eliminates extremely large learning rates during training.

02

AdaMod significantly improves training stability on DenseNet and Transformer models.

03

Experimental results show AdaMod outperforms Adam in complex network training.

Abstract

Training deep neural networks requires intricate initialization and careful selection of learning rates. The emergence of stochastic gradient optimization methods that use adaptive learning rates based on squared past gradients, e.g., AdaGrad, AdaDelta, and Adam, eases the job slightly. However, such methods have also been proven problematic in recent studies with their own pitfalls including non-convergence issues and so on. Alternative variants have been proposed for enhancement, such as AMSGrad, AdaShift and AdaBound. In this work, we identify a new problem of adaptive learning rate methods that exhibits at the beginning of learning where Adam produces extremely large learning rates that inhibit the start of learning. We propose the Adaptive and Momental Bound (AdaMod) method to restrict the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · AdaShift · AdaMod · AdaBound · Batch Normalization · Residual Connection · Convolution · Average Pooling