SignMuon: Communication-Efficient Distributed Muon Optimization
Neel Mishra, Kushagara Trivedi, Pawan Kumar

TL;DR
SignMuon is a novel 1-bit, matrix-aware optimizer for distributed neural network training that significantly reduces communication overhead while maintaining high accuracy and efficiency.
Contribution
It introduces SignMuon, combining signSGD and Muon, enabling efficient, orthogonalized, matrix-aware optimization with minimal communication, outperforming existing methods.
Findings
Achieves 92.15% validation accuracy on CIFAR-10/ResNet-50.
Reduces bandwidth by 32x compared to float32.
Outperforms sign-based baselines on nanoGPT with lower perplexity.
Abstract
Distributed training of large neural networks is bottlenecked by full-precision gradient communication and by coordinatewise optimizers that ignore the matrix structure of weight tensors. We propose Sign-Muon, a 1-bit, matrix-aware optimizer that combines majority-vote sign aggregation from signSGD with the polar-step framework of Muon. Each worker forms a Muon-style direction by taking the polar factor of its momentum via a Newton--Schulz iteration, transmits only the entrywise signs, and aggregates by majority vote; an optional local polar step further enforces orthogonality at no extra communication cost. Under spectral-norm smoothness and bounded-variance stochastic gradients, the spectral-norm normalized sign step yields an nonconvex rate for an -based stationarity measure. With unimodal symmetric noise, majority vote across workers cuts the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
