Towards a Principled Muon under $\mu\mathsf{P}$: Ensuring Spectral Conditions throughout Training
John Zhao

TL;DR
This paper introduces Muon++, a variant of the Muon optimizer that guarantees spectral conditions throughout training for large language models, enabling predictable scaling and reducing computational overhead.
Contribution
We propose a practical method to ensure spectral conditions for Muon during entire training, eliminating the need for repeated spectral normalization and enabling scalable, predictable LLM training.
Findings
Muon++ maintains spectral conditions throughout training.
Eliminates the need for explicit spectral normalization of weights.
Improves practical deployment of matrix-based optimizers in long-horizon training.
Abstract
The -parameterization (P) provides a principled foundation for large language model (LLM) training by prescribing width-independent learning dynamics, which in turn enables predictable scaling behavior and robust hyperparameter transfer across model sizes. A central requirement of P is the satisfaction of certain spectral conditions on weight matrices, which ensure consistent feature learning and optimization behavior as model width grows. While these conditions are well understood in theory, guaranteeing their validity in practical training for matrix-based optimizers such as Muon is still under studied. Existing works that study Muon under P exhibit important limitations: they either do not ensure that the spectral conditions hold throughout the entire training horizon, or require repeated spectral normalization (or Newton-Schulz iterations) applied to both weights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMuon and positron interactions and applications · Machine Learning in Materials Science · Computational Physics and Python Applications
