Understanding Adam Requires Better Rotation Dependent Assumptions

Tianyue H. Zhang; Lucas Maes; Alan Milligan; Alexia Jolicoeur-Martineau; Ioannis Mitliagkas; Damien Scieur; Simon Lacoste-Julien; Charles Guille-Escuret

arXiv:2410.19964·cs.LG·November 7, 2025

Understanding Adam Requires Better Rotation Dependent Assumptions

Tianyue H. Zhang, Lucas Maes, Alan Milligan, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas, Damien Scieur, Simon Lacoste-Julien, Charles Guille-Escuret

PDF

Open Access 1 Video

TL;DR

This paper explores Adam optimizer's sensitivity to rotations of the parameter space, revealing that its empirical success depends on basis choices and highlighting the need for rotation-aware theoretical models.

Contribution

It uncovers Adam's rotation sensitivity, challenges existing rotation-invariant assumptions, and proposes orthogonality of updates as a key factor for future theories.

Findings

01

Adam's performance degrades under random rotations

02

Structured rotations can preserve or improve Adam's performance

03

Orthogonality of updates correlates with basis sensitivity

Abstract

Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent (SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's sensitivity to rotations of the parameter space. We observe that Adam's performance in training transformers degrades under random rotations of the parameter space, indicating a crucial sensitivity to the choice of basis in practice. This reveals that conventional rotation-invariant assumptions are insufficient to capture Adam's advantages theoretically. To better understand the rotation-dependent properties that benefit Adam, we also identify structured rotations that preserve or even enhance its empirical performance. We then examine the rotation-dependent assumptions in the literature and find that they fall short in explaining Adam's behaviour across various rotation types. In contrast, we verify the orthogonality of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Adam Requires Better Rotation Dependent Assumptions· slideslive

Taxonomy

TopicsDesign Education and Practice

MethodsAdam