Delving into Muon and Beyond: Deep Analysis and Extensions
Xianbiao Qi, Marco Chen, Jiaquan Ye, Yelin He, Rong Xiao

TL;DR
This paper provides a spectral perspective on the Muon optimizer, analyzing its mechanisms, comparing it with Adam, and exploring variants to understand its stability and performance in optimization tasks.
Contribution
It introduces a unified spectral framework for Muon and related optimizers, and evaluates their stability and effectiveness through controlled experiments.
Findings
RMS-normalized updates are more stable than first-moment updates.
Spectral compression stabilizes first-moment updates but does not always outperform Adam.
Muon acts as a form of spectral normalization rather than a universally superior optimizer.
Abstract
The Muon optimizer has recently attracted considerable attention for its strong empirical performance and use of orthogonalized updates on matrix-shaped parameters, yet its underlying mechanisms and relationship to adaptive optimizers such as Adam remain insufficiently understood. In this work, we aim to address these questions through a unified spectral perspective. Specifically, we view Muon as the p = 0 endpoint of a family of spectral transformations of the form U \boldsymbol{\Sigma}^{p} V' , and consider additional variants with p = 1/2 , p = 1/4 , and p = 1 . These transformations are applied to both first-moment updates, as in momentum SGD, and to root-mean-square (RMS) normalized gradient updates as in Adam. To enable efficient computation, we develop a coupled Newton iteration that avoids explicit singular value decomposition. Across controlled experiments, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle physics theoretical and experimental studies · Muon and positron interactions and applications · Computational Physics and Python Applications
