TL;DR
This paper introduces GONO, a new optimizer that leverages directional consistency in gradients to improve convergence, demonstrating theoretical guarantees and empirical effectiveness across multiple benchmarks.
Contribution
The paper formalizes directional alignment as an optimization signal and develops GONO, an adaptive optimizer that exploits this phenomenon for better training performance.
Findings
GONO matches Adam's convergence rate of O(1/√T).
cc_t effectively detects oscillations with F1=1.00.
GONO performs competitively with AdamW on various datasets.
Abstract
We identify and formalize an underexplored phenomenon in deep learning optimization: directional alignment and loss convergence can be decoupled. An optimizer can exhibit near-perfect directional consistency (cc_t -> 1, measured via consecutive gradient cosine similarity) while the loss remains high or decreases slowly. This observation reveals that existing optimizers such as Adam, SGD, and RMSprop lack explicit mechanisms to exploit temporal consistency in gradient directions, relying instead on magnitude-based signals that fail to distinguish plateaus, saddle points, and genuine convergence. Motivated by this, we introduce GONO (Gradient-Oriented Norm-Adaptive Optimizer), which adapts Adam's momentum coefficient beta_1 based on cc_t: amplifying momentum under directional consistency and suppressing it during oscillation. We prove GONO matches Adam's O(1/sqrt(T)) convergence rate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
