Grams: Gradient Descent with Adaptive Momentum Scaling

Yang Cao; Xiaoyu Li; Zhao Song

arXiv:2412.17107·cs.LG·March 6, 2025

Grams: Gradient Descent with Adaptive Momentum Scaling

Yang Cao, Xiaoyu Li, Zhao Song

PDF

Open Access 1 Repo

TL;DR

Grams is a new optimization algorithm for deep learning that separates gradient direction from momentum-based magnitude scaling, leading to faster convergence and better generalization than existing methods.

Contribution

It introduces a novel optimizer that decouples update direction from magnitude scaling, with theoretical guarantees and superior empirical performance.

Findings

01

Faster convergence than Adam and Lion.

02

Better generalization in training large language models.

03

Theoretical proof of global convergence.

Abstract

We introduce $G$ radient Descent with $A$ daptive $M$ omentum $S$ caling ( $Grams$ ), a novel optimization algorithm that decouples the direction and magnitude of parameter updates in deep learning. Unlike traditional optimizers that directly integrate momentum into updates, Grams separates the update direction, derived from current gradients, from momentum, which is used solely for adaptive magnitude scaling. This approach enables Grams to achieve improved loss descent compared to state-of-the-art cautious and momentum-based optimizers. We theoretically demonstrate that Grams descents faster than other state-of-the-art optimizers and establish a global convergence guarantee for Grams. We also validate its effectiveness through extensive empirical evaluations. The results demonstrate Grams' superior performance, including faster convergence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Gunale0926/Grams
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSlime Mold and Myxomycetes Research

MethodsAdam · Evolved Sign Momentum