
TL;DR
The paper introduces Ano, a new optimizer that improves robustness in noisy and non-stationary environments by decoupling direction and magnitude, with theoretical guarantees and empirical benefits.
Contribution
Ano is a novel optimizer that separates directional smoothing from step size determination, enhancing performance in noisy settings, and Anolog removes momentum sensitivity with a logarithmic window expansion.
Findings
Ano outperforms Adam and Adan in noisy environments.
Ano maintains competitive performance on low-noise tasks.
Theoretical convergence guarantees are established for Ano.
Abstract
Stochastic optimizers are central to deep learning, yet widely used methods such as Adam and Adan can degrade in non-stationary or noisy environments, partly due to their reliance on momentum-based magnitude estimates. We introduce Ano, a novel optimizer that decouples direction and magnitude: momentum is used for directional smoothing, while instantaneous gradient magnitudes determine step size. This design improves robustness to gradient noise while retaining the simplicity and efficiency of first-order methods. We further propose Anolog, which removes sensitivity to the momentum coefficient by expanding its window over time via a logarithmic schedule. We establish non-convex convergence guarantees with a convergence rate similar to other sign-based methods, and empirically show that Ano provides substantial gains in noisy and non-stationary regimes such as reinforcement learning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
