Muon Dynamics as a Spectral Wasserstein Flow

Gabriel Peyr\'e

arXiv:2604.04891·math.OC·May 11, 2026

Muon Dynamics as a Spectral Wasserstein Flow

Gabriel Peyr\'e

PDF

TL;DR

This paper introduces a spectral Wasserstein framework for analyzing mean-field normalized training dynamics in deep learning, connecting matrix norms with gradient flow interpretations.

Contribution

It develops a unified spectral Wasserstein distance framework, extending classical optimal transport to matrix norms and linking it to mean-field normalized training dynamics.

Findings

01

Spectral Wasserstein distances interpolate between classical $W_2$ and Muon geometries.

02

Established a gradient-flow interpretation of normalized training dynamics.

03

Numerical experiments demonstrate the framework's applicability to various models.

Abstract

Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum version of this idea in the mean-field regime, where wide models are represented by probability measures on parameter space. Starting from normalized matrix flows, we introduce Spectral Wasserstein distances indexed by norms $γ$ on positive semidefinite matrices: the trace norm gives classical $W_{2}$ , the operator norm gives the Muon geometry, and Schatten norms interpolate between them. We develop the static Kantorovich formulation, a max-min robust-cost representation, Gaussian reductions extending the Bures formula, and for monotone norms, prove equivalence with a Benamou--Brenier formulation. This yields a gradient-flow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.