HTMuon: Improving Muon via Heavy-Tailed Spectral Correction
Tianyu Pang, Yujie Fang, Zihang Liu, Shenyang Deng, Lei Hsiung, Shuhua Yu, Yaoqing Yang

TL;DR
HTMuon is a novel optimization method inspired by heavy-tailed spectral correction that enhances Muon's performance in large language model training and image classification by producing heavier-tailed weight spectra.
Contribution
This work introduces HTMuon, a new variant of Muon that incorporates heavy-tailed spectral correction, improving training stability and performance in LLMs and vision tasks.
Findings
HTMuon reduces perplexity by up to 0.98 on LLaMA pretraining.
HTMuon outperforms state-of-the-art baselines in experiments.
HTMuon can be integrated as a plug-in with existing Muon variants.
Abstract
Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and over-emphasizes the training along noise-dominated directions. Motivated by the Heavy-Tailed Self-Regularization (HT-SR) theory, we propose HTMuon. HTMuon preserves Muon's ability to capture parameter interdependencies while producing heavier-tailed updates and inducing heavier-tailed weight spectra. Experiments on LLM pretraining and image classification show that HTMuon consistently improves performance over state-of-the-art baselines and can also serve as a plug-in on top of existing Muon variants. For example, on LLaMA pretraining on the C4 dataset, HTMuon reduces perplexity by up to compared to Muon. We further theoretically show that HTMuon corresponds to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMuon and positron interactions and applications · Computational Physics and Python Applications · Particle physics theoretical and experimental studies
