Uniform Spectral Growth and Convergence of Muon in LoRA-Style Matrix Factorization
Changmin Kang, Jihun Yun, Baekrok Shin, Yeseul Cho, Chulhee Yun

TL;DR
This paper investigates the spectral dynamics of Muon in LoRA-style matrix factorization, revealing uniform singular value growth and proving convergence properties of the spectral gradient flow in simplified models.
Contribution
It uncovers the uniform spectral growth phenomenon in Muon for LoRA fine-tuning and provides theoretical analysis of spectral gradient flow convergence and dynamics.
Findings
Singular values grow uniformly across the spectrum under Muon in LoRA.
Spectral gradient flow converges to global minima from almost all initializations.
Smaller singular values reach their targets earlier than larger ones, contrasting standard gradient flow.
Abstract
Spectral gradient descent (SpecGD) orthogonalizes the matrix parameter updates and has inspired practical optimizers such as Muon. They often perform well in large language model (LLM) training, but their dynamics remain poorly understood. In the low-rank adaptation (LoRA) setting, where weight updates are parameterized as a product of two low-rank factors, we find a distinctive spectral phenomenon under Muon in LoRA fine-tuning of LLMs: singular values of the LoRA product show near-uniform growth across the spectrum, despite orthogonalization being performed on the two factors separately. Motivated by this observation, we analyze spectral gradient flow (SpecGF)-a continuous-time analogue of SpecGD-in a simplified LoRA-style matrix factorization setting and prove "equal-rate" dynamics: all singular values grow at equal rates up to small deviations. Consequently, smaller singular values…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced NMR Techniques and Applications · Machine Learning in Materials Science
