MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

Da Chang; Qiankun Shi; Lvgang Zhang; Yu Li; Ruijie Zhang; Yao Lu; Yongxiang Liu; Ganzhao Yuan

arXiv:2603.28254·cs.LG·May 12, 2026

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

Da Chang, Qiankun Shi, Lvgang Zhang, Yu Li, Ruijie Zhang, Yao Lu, Yongxiang Liu, Ganzhao Yuan

PDF

1 Repo

TL;DR

MuonEq introduces lightweight pre-orthogonalization equilibration schemes to enhance Muon optimizer training, improving convergence and perplexity in large language model pretraining.

Contribution

It proposes a novel, computationally light equilibration method that improves the geometry for orthogonalization, extending theoretical guarantees and demonstrating empirical benefits.

Findings

01

MuonEq (R) outperforms Muon in LLaMA2 pretraining across multiple model sizes.

02

Faster convergence and lower validation perplexity observed with MuonEq (R).

03

Theoretical analysis shows retention of standard nonconvex stationarity guarantees.

Abstract

Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions typically either rescale updates after orthogonalization or use heavier whitening-based preconditioners before it. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon with three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). By rebalancing the momentum matrix before finite-step Newton--Schulz orthogonalization, {\method} improves the geometry seen by orthogonalization. We show that finite-step orthogonalization is governed by the input spectrum, especially stable rank and condition number, and that row/column normalization acts as a zeroth-order surrogate for whitening. For hidden matrix weights, R is the default variant. Theoretically, {\method} (R) retains the standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MaeChd/muon-eq
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.