TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers
Peng Cheng, Jiucheng Zang, Qingnan Li, Liheng Ma, Yufei Cui, Yingxue Zhang, Boxing Chen, Ming Jian, Wen Tong

TL;DR
TrasMuon introduces a trust-region adaptive scaling method for orthogonalized momentum optimizers, improving stability and convergence speed in training vision and language models by mitigating high-energy outliers.
Contribution
It proposes a novel trust-region based adaptive scaling technique that stabilizes Muon-style optimizers, enhancing their robustness and efficiency in deep learning training.
Findings
Faster convergence on vision and language models.
Enhanced stability without warmup stages.
Effective mitigation of high-energy outliers.
Abstract
Muon-style optimizers leverage Newton-Schulz (NS) iterations to orthogonalize updates, yielding update geometries that often outperform Adam-series methods. However, this orthogonalization discards magnitude information, rendering training sensitive to step-size hyperparameters and vulnerable to high-energy bursts. To mitigate this, we introduce TrasMuon (\textbf{T}rust \textbf{R}egion \textbf{A}daptive \textbf{S}caling \textbf{Muon}). TrasMuon preserves the near-isometric geometry of Muon while stabilizing magnitudes through (i) global RMS calibration and (ii) energy-based trust-region clipping. We demonstrate that while reintroducing adaptive scaling improves optimization efficiency, it typically exacerbates instability due to high-energy outliers. TrasMuon addresses this by defining a trust region based on relative energy ratios, confining updates to a stable zone. Empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle physics theoretical and experimental studies · Stochastic Gradient Optimization Techniques · Computational Physics and Python Applications
