Improved Convergence Rates of Muon Optimizer for Nonconvex Optimization
Shuntaro Nagashima, Hideaki Iiduka

TL;DR
This paper provides sharper convergence guarantees for the Muon optimizer in nonconvex optimization, improving theoretical bounds and broadening applicability beyond previous restrictive assumptions.
Contribution
We establish improved convergence rates for the Muon optimizer using a direct analysis that relaxes previous restrictive assumptions, enhancing theoretical understanding.
Findings
Faster convergence rates achieved
Broader class of problem settings covered
More accurate theoretical characterization of Muon
Abstract
The Muon optimizer has recently attracted attention due to its orthogonalized first-order updates, and a deeper theoretical understanding of its convergence behavior is essential for guiding practical applications; however, existing convergence guarantees are either coarse or obtained under restrictive analytical settings. In this work, we establish sharper convergence guarantees for the Muon optimizer through a direct and simplified analysis that does not rely on restrictive assumptions on the update rule. Our results improve upon existing bounds by achieving faster convergence rates while covering a broader class of problem settings. These findings provide a more accurate theoretical characterization of Muon and offer insights applicable to a broader class of orthogonalized first-order methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMuon and positron interactions and applications · Stochastic Gradient Optimization Techniques · Neutrino Physics Research
