Muon Optimizes Under Spectral Norm Constraints
Lizhang Chen, Jonathan Li, Qiang Liu

TL;DR
This paper provides a theoretical foundation for the Muon optimizer by linking it to the Lion-$\\mathcal{K}$ family and showing it enforces spectral norm constraints, enhancing understanding of its regularization effects.
Contribution
It establishes a theoretical connection between Muon and Lion-$\mathcal{K}$ optimizers, revealing Muon's implicit spectral norm regularization and proposing generalizations.
Findings
Muon corresponds to Lion-$\mathcal{K}$ with nuclear norm
Muon's implicit regularization enforces spectral norm constraints
Generalizations via different convex maps expand optimizer design
Abstract
The pursuit of faster optimization algorithms remains an active and important research direction in deep learning. Recently, the Muon optimizer [JJB+24] has demonstrated promising empirical performance, but its theoretical foundation remains less understood. In this paper, we bridge this gap and provide a theoretical analysis of Muon by placing it within the Lion- family of optimizers [CLLL24]. Specifically, we show that Muon corresponds to Lion- when equipped with the nuclear norm, and we leverage the theoretical results of Lion- to establish that Muon (with decoupled weight decay) implicitly solves an optimization problem that enforces a constraint on the spectral norm of weight matrices. This perspective not only demystifies the implicit regularization effects of Muon but also leads to natural generalizations through varying the choice of convex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMuon and positron interactions and applications · Particle Detector Development and Performance · Particle physics theoretical and experimental studies
