Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent
Avrajit Ghosh, He Lyu, Xitong Zhang, Rongrong Wang

TL;DR
This paper investigates how the momentum parameter in Heavy-ball accelerated gradient descent influences implicit regularization, showing it leads to stronger regularization and better generalization compared to standard gradient descent.
Contribution
The paper demonstrates that momentum in Heavy-ball methods enhances implicit regularization, providing a theoretical explanation for improved generalization and extending analysis to stochastic gradient descent with momentum.
Findings
Implicit regularizer for (GD+M) is stronger than (GD) by a factor of (1+β)/(1−β).
Heavy-ball momentum accelerates convergence and improves test accuracy.
Experiments validate the theoretical analysis of implicit regularization effects.
Abstract
It is well known that the finite step-size () in Gradient Descent (GD) implicitly regularizes solutions to flatter minima. A natural question to ask is "Does the momentum parameter play a role in implicit regularization in Heavy-ball (H.B) momentum accelerated gradient descent (GD+M)?". To answer this question, first, we show that the discrete H.B momentum update (GD+M) follows a continuous trajectory induced by a modified loss, which consists of an original loss and an implicit regularizer. Then, we show that this implicit regularizer for (GD+M) is stronger than that of (GD) by factor of , thus explaining why (GD+M) shows better generalization performance and higher test accuracy than (GD). Furthermore, we extend our analysis to the stochastic version of gradient descent with momentum (SGD+M) and characterize the continuous trajectory of the update…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Statistical Mechanics and Entropy
MethodsTest
