Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization
Tao Sun, Huaming Ling, Zuoqiang Shi, Dongsheng Li, Bao Wang

TL;DR
This paper introduces an adaptive momentum method inspired by quadratic optimization to enhance stochastic gradient methods, reducing hyperparameter tuning and improving convergence, robustness, and generalization across various machine learning tasks.
Contribution
The paper proposes a novel adaptive momentum scheme for SGD and Adam, eliminating the need for hyperparameter tuning and providing theoretical convergence guarantees.
Findings
Improved convergence speed of SGD and Adam with adaptive momentum
Enhanced robustness to large learning rates
Better generalization performance on diverse benchmarks
Abstract
Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization. Our proposed adaptive heavy ball momentum can improve stochastic gradient descent (SGD) and Adam. SGD and Adam with the newly designed adaptive momentum are more robust to large learning rates, converge faster, and generalize better than the baselines. We verify the efficiency of SGD and Adam with the new adaptive momentum on extensive machine learning benchmarks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
MethodsAdam · Stochastic Gradient Descent
