Beyond the Ideal: Analyzing the Inexact Muon Update
Egor Shulgin, Sultan AlRashed, Francesco Orabona, Peter Richt\'arik

TL;DR
This paper analyzes the practical inexact orthogonalization in Muon optimizer, providing explicit bounds on performance degradation and revealing how approximation errors influence optimal step size and momentum, with experimental validation on NanoGPT.
Contribution
First theoretical analysis of inexact Muon updates within the LMO framework, linking approximation errors to optimization hyperparameters and demonstrating their impact through experiments.
Findings
Inexact orthogonalization affects Muon's efficiency and requires co-tuning of hyperparameters.
Explicit bounds quantify performance loss due to approximation errors.
Experimental results on NanoGPT confirm the theoretical predictions.
Abstract
The Muon optimizer has rapidly emerged as a powerful, geometry-aware alternative to AdamW, demonstrating strong performance in large-scale training of neural networks. However, a critical theory-practice disconnect exists: Muon's efficiency relies on fast, approximate orthogonalization, yet all prior theoretical work analyzes an idealized, computationally intractable version assuming exact SVD-based updates. This work moves beyond the ideal by providing the first analysis of the inexact orthogonalized update at Muon's core. We develop our analysis within the general framework of Linear Minimization Oracle (LMO)-based optimization, introducing a realistic additive error model to capture the inexactness of practical approximation schemes. Our analysis yields explicit bounds that quantify performance degradation as a function of the LMO inexactness/error. We reveal a fundamental coupling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Particle physics theoretical and experimental studies · Muon and positron interactions and applications
