Muon is Provably Faster with Momentum Variance Reduction
Xun Qian, Hussein Rammal, Dmitry Kovalev, Peter Richt\'arik

TL;DR
This paper proves that incorporating momentum variance reduction into Non-Euclidean LMO-based optimizers like Muon improves convergence rates and performance in training neural networks.
Contribution
It introduces a unified framework with MVR for Non-Euclidean LMO-based optimizers, improving theoretical convergence rates and empirical performance.
Findings
Convergence rate improved from O(1/K^{1/4}) to O(1/K^{1/3}) in non-convex cases.
Unified MVR framework enhances Muon, Scion, and similar optimizers.
Numerical experiments confirm superior iteration complexity.
Abstract
Recent empirical research has demonstrated that deep learning optimizers based on the linear minimization oracle (LMO) over specifically chosen Non-Euclidean norm balls, such as Muon and Scion, outperform Adam-type methods in the training of large language models. In this work, we show that such optimizers can be provably improved by replacing their vanilla momentum by momentum variance reduction (MVR). Instead of proposing and analyzing MVR variants of Muon and Scion separately, we incorporate MVR into the recently proposed Gluon framework, which captures Muon, Scion and other specific Non-Euclidean LMO-based methods as special cases, and at the same time works with a more general smoothness assumption which better captures the layer-wise structure of neural networks. In the non-convex case, we incorporate MVR into Gluon in three different ways. All of them improve the convergence rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Computational Physics and Python Applications · Particle physics theoretical and experimental studies
