Dion2: A Simple Method to Shrink Matrix in Muon
Kwangjun Ahn, Noah Amsel, John Langford

TL;DR
Dion2 is a simple, scalable method that reduces computational overhead in Muon by sampling and orthonormalizing only parts of the matrix, improving efficiency without sacrificing performance.
Contribution
Dion2 introduces a novel sampling-based matrix shrinking technique that simplifies and accelerates Muon's orthonormalization process.
Findings
Reduces computation and communication costs.
Enhances scalability of Muon.
Maintains empirical performance.
Abstract
The Muon optimizer enjoys strong empirical performance and theoretical grounding. However, the super-linear cost of its orthonormalization step introduces increasing overhead with scale. To alleviate this cost, several works have attempted to reduce the size of the matrix entering the orthonormalization step. We introduce Dion2, a much simpler method for shrinking the matrix involved in Muon's computation compared to prior approaches. At a high level, Dion2 selects a fraction of rows or columns at each iteration and orthonormalizes only those. This sampling procedure makes the update sparse, reducing both computation and communication costs which in turn improves the scalability of Muon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMuon and positron interactions and applications · Particle physics theoretical and experimental studies · Neutrino Physics Research
