Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation
Sarwan Ali

TL;DR
This paper introduces HB-SGE, a robust gradient descent method combining heavy-ball momentum with predictive gradient extrapolation, which converges reliably on ill-conditioned and non-convex problems where traditional methods diverge.
Contribution
The paper proposes HB-SGE, a novel first-order optimization algorithm that enhances stability and convergence in challenging landscapes by estimating future gradients through local Taylor approximations.
Findings
HB-SGE converges on ill-conditioned quadratics where NAG diverges.
On the Rosenbrock function, HB-SGE outperforms classical momentum methods.
HB-SGE maintains stability with minimal additional memory and hyperparameter tuning.
Abstract
Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes due to aggressive momentum accumulation. We propose Heavy-Ball Synthetic Gradient Extrapolation (HB-SGE), a robust first-order method that combines heavy-ball momentum with predictive gradient extrapolation. Unlike classical momentum methods that accumulate historical gradients, HB-SGE estimates future gradient directions using local Taylor approximations, providing adaptive acceleration while maintaining stability. We prove convergence guarantees for strongly convex functions and demonstrate empirically that HB-SGE prevents divergence on problems where NAG and standard momentum fail. On ill-conditioned quadratics (condition number ), HB-SGE converges in 119 iterations while both SGD and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference
