Quasi-hyperbolic momentum and Adam for deep learning

Jerry Ma; Denis Yarats

arXiv:1810.06801·cs.LG·May 3, 2019·48 cites

Quasi-hyperbolic momentum and Adam for deep learning

Jerry Ma, Denis Yarats

PDF

Open Access 2 Repos

TL;DR

This paper introduces the quasi-hyperbolic momentum (QHM) and QHAdam algorithms, simple modifications of existing optimizers that improve training efficiency and achieve state-of-the-art results in deep learning tasks.

Contribution

The paper presents QHM and QHAdam, novel optimizer algorithms that enhance deep learning training with simplicity and empirical effectiveness.

Findings

01

QHM improves convergence speed in deep learning models.

02

QHAdam outperforms standard Adam on various benchmarks.

03

Achieved new state-of-the-art on WMT16 EN-DE translation task.

Abstract

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. Code is immediately available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications

MethodsQHM · QHAdam · Stochastic Gradient Descent