DynMuon: A Dynamic Spectral Shaping View of Muon

Fangzhou Wu; Rikhav Shah; Sandeep Silwal; Qiuyi Zhang

arXiv:2605.17109·cs.LG·May 19, 2026

DynMuon: A Dynamic Spectral Shaping View of Muon

Fangzhou Wu, Rikhav Shah, Sandeep Silwal, Qiuyi Zhang

PDF

TL;DR

DynMuon introduces a dynamic spectral shaping approach for large language model training, adjusting update strategies over time to improve efficiency and reduce training steps.

Contribution

It develops a theory for spectral-shaping updates and proposes DynMuon, a method that schedules spectral parameters dynamically for better training performance.

Findings

01

DynMuon achieves lower validation loss than Muon.

02

Requires 10.6-26.5% fewer training steps.

03

Effective across various models and training settings.

Abstract

In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the usual update matrix $M = U Σ V^{⊤}$ with its polar factor $U V^{⊤}$ . In this work, we consider a class of Muon-like updates, where we replace the update $M$ with $U Σ^{p} V^{⊤}$ for some parameter $p$ . We call this a "spectral-shaping" operation, and develop a theory of how to pick $p$ which depends on (a) local curvature of the loss function, (b) noise stemming from stochastic gradients and label noise, and (c) training stage. Our theory and experimentation reveal a previously overlooked behavior: positive $p$ helps early by emphasizing high-curvature directions and accelerating signal contraction, while mildly negative $p$ helps later by reallocating update…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.