Adaptive Optimization via Momentum on Variance-Normalized Gradients
Francisco Patitucci, Aryan Mokhtari

TL;DR
MVN-Grad is a new optimizer that combines variance normalization and momentum to improve stability, robustness, and convergence in training deep neural networks, outperforming existing optimizers in benchmarks.
Contribution
Introduces MVN-Grad, an Adam-style optimizer with variance normalization and momentum, providing theoretical guarantees and empirical improvements over existing methods.
Findings
MVN-Grad achieves smaller update variance than standard Adam.
It is robust to outliers and gradient spikes.
Performs better or comparable to Adam, AdaBelief, and LaProp on benchmarks.
Abstract
We introduce MVN-Grad (Momentum on Variance-Normalized Gradients), an Adam-style optimizer that improves stability and performance by combining two complementary ideas: variance-based normalization and momentum applied after normalization. MVN-Grad scales each coordinate by an exponential moving average of gradient uncertainty and applies momentum to the resulting normalized gradients, eliminating the cross-time coupling between stale momentum and a stochastic normalizer present in standard Adam-type updates. We prove that this decoupling yields strictly smaller one-step conditional update variance than momentum-then-normalize variance methods under standard noise assumptions, and that MVN-Grad is robust to outliers: it has a uniformly bounded response to single gradient spikes. In low-variance regimes, we further show variance normalization avoids sign-type collapse associated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
