The Implicit Regularization of Momentum Gradient Descent with Early Stopping
Li Wang (1), Yingcong Zhou (2), Zhiguo Fu (1) ((1) Northeast Normal, University, (2) Beihua University)

TL;DR
This paper investigates how momentum gradient descent with early stopping implicitly regularizes solutions similarly to ridge regression, providing theoretical bounds and empirical validation.
Contribution
It characterizes the implicit regularization of momentum gradient descent with early stopping by comparing it to ridge regression and establishing risk bounds.
Findings
MGD's implicit regularization is closer to ridge than plain gradient descent.
Under specific calibration, MGF's risk is at most 1.54 times ridge risk.
Numerical experiments strongly support the theoretical analysis.
Abstract
The study on the implicit regularization induced by gradient-based optimization is a longstanding pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) with early stopping by comparing with the explicit -regularization (ridge). In details, we study MGD in the continuous-time view, so-called momentum gradient flow (MGF), and show that its tendency is closer to ridge than the gradient descent (GD) [Ali et al., 2019] for least squares regression. Moreover, we prove that, under the calibration , where is the time parameter in MGF and is the tuning parameter in ridge regression, the risk of MGF is no more than 1.54 times that of ridge. In particular, the relative Bayes risk of MGF to ridge is between 1 and 1.035 under the optimal tuning. The numerical experiments support our theoretical results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
MethodsEarly Stopping
