Overshoot: Taking advantage of future gradients in momentum-based   stochastic optimization

Jakub Kopal; Michal Gregor; Santiago de Leon-Martinez; Jakub Simko

arXiv:2501.09556·cs.LG·January 17, 2025

Overshoot: Taking advantage of future gradients in momentum-based stochastic optimization

Jakub Kopal, Michal Gregor, Santiago de Leon-Martinez, Jakub Simko

PDF

Open Access 1 Repo

TL;DR

Overshoot is a new momentum-based stochastic optimization method that evaluates gradients at shifted weights to improve convergence speed and performance across various tasks.

Contribution

The paper introduces Overshoot, a novel gradient evaluation technique that enhances existing momentum optimizers by shifting weights in the momentum direction.

Findings

01

Overshoot achieves at least 15% reduction in optimization steps.

02

It outperforms standard and Nesterov's momentum on multiple tasks.

03

The method adds minimal computational overhead.

Abstract

Overshoot is a novel, momentum-based stochastic gradient descent optimization method designed to enhance performance beyond standard and Nesterov's momentum. In conventional momentum methods, gradients from previous steps are aggregated with the gradient at current model weights before taking a step and updating the model. Rather than calculating gradient at the current model weights, Overshoot calculates the gradient at model weights shifted in the direction of the current momentum. This sacrifices the immediate benefit of using the gradient w.r.t. the exact model weights now, in favor of evaluating at a point, which will likely be more relevant for future updates. We show that incorporating this principle into momentum-based optimizers (SGD with momentum and Adam) results in faster convergence (saving on average at least 15% of steps). Overshoot consistently outperforms both standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kinit-sk/overshoot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Stochastic Gradient Optimization Techniques