SignMuon: Communication-Efficient Distributed Muon Optimization

Neel Mishra; Kushagara Trivedi; Pawan Kumar

arXiv:2605.16311·cs.LG·May 19, 2026

SignMuon: Communication-Efficient Distributed Muon Optimization

Neel Mishra, Kushagara Trivedi, Pawan Kumar

PDF

TL;DR

SignMuon is a novel 1-bit, matrix-aware optimizer for distributed neural network training that significantly reduces communication overhead while maintaining high accuracy and efficiency.

Contribution

It introduces SignMuon, combining signSGD and Muon, enabling efficient, orthogonalized, matrix-aware optimization with minimal communication, outperforming existing methods.

Findings

01

Achieves 92.15% validation accuracy on CIFAR-10/ResNet-50.

02

Reduces bandwidth by 32x compared to float32.

03

Outperforms sign-based baselines on nanoGPT with lower perplexity.

Abstract

Distributed training of large neural networks is bottlenecked by full-precision gradient communication and by coordinatewise optimizers that ignore the matrix structure of weight tensors. We propose Sign-Muon, a 1-bit, matrix-aware optimizer that combines majority-vote sign aggregation from signSGD with the polar-step framework of Muon. Each worker forms a Muon-style direction by taking the polar factor of its momentum via a Newton--Schulz iteration, transmits only the entrywise signs, and aggregates by majority vote; an optional local polar step further enforces orthogonality at no extra communication cost. Under spectral-norm smoothness and bounded-variance stochastic gradients, the spectral-norm normalized sign step yields an $O (1/ T)$ nonconvex rate for an $ℓ_{1}$ -based stationarity measure. With unimodal symmetric noise, majority vote across $M$ workers cuts the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.