Training Deep Neural Networks with Adaptive Momentum Inspired by the   Quadratic Optimization

Tao Sun; Huaming Ling; Zuoqiang Shi; Dongsheng Li; Bao Wang

arXiv:2110.09057·cs.LG·October 19, 2021·5 cites

Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization

Tao Sun, Huaming Ling, Zuoqiang Shi, Dongsheng Li, Bao Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adaptive momentum method inspired by quadratic optimization to enhance stochastic gradient methods, reducing hyperparameter tuning and improving convergence, robustness, and generalization across various machine learning tasks.

Contribution

The paper proposes a novel adaptive momentum scheme for SGD and Adam, eliminating the need for hyperparameter tuning and providing theoretical convergence guarantees.

Findings

01

Improved convergence speed of SGD and Adam with adaptive momentum

02

Enhanced robustness to large learning rates

03

Better generalization performance on diverse benchmarks

Abstract

Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization. Our proposed adaptive heavy ball momentum can improve stochastic gradient descent (SGD) and Adam. SGD and Adam with the newly designed adaptive momentum are more robust to large learning rates, converge faster, and generalize better than the baselines. We verify the efficiency of SGD and Adam with the new adaptive momentum on extensive machine learning benchmarks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kentaroy47/vision-transformers-cifar10
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms

MethodsAdam · Stochastic Gradient Descent