PolarAdamW: Disentangling Spectral Control and Schur Gauge-Equivariance in Matrix Optimisation
Haozhou Zhang

TL;DR
PolarAdamW is a novel matrix optimisation method that disentangles spectral control from gauge-equivariance, demonstrating improved performance on some tasks and revealing the importance of gauge properties in others.
Contribution
It introduces PolarAdamW, a hybrid algorithm that preserves spectral control while breaking gauge-equivariance, and analyzes its properties and performance in different settings.
Findings
PolarAdamW outperforms Muon and AdamW on DeiT-Tiny image classification.
Muon's polar step is Schur gauge-equivariant, AdamW's is not.
In 3D point-cloud regression, Muon outperforms PolarAdamW when multiplicity-basis freedom is non-trivial.
Abstract
Muon's matrix-level update couples two distinct effects: spectral control via a polar map, and equivariance under orthogonal changes of multiplicity-space basis (Schur gauge-equivariance). We separate them with PolarAdamW, a controlled hybrid that preserves Muon's polar spectral-norm control but breaks the gauge-equivariance, since AdamW's coordinatewise preconditioner is basis-dependent. Algorithmically, PolarAdamW applies Muon's Newton-Schulz polar map to AdamW's preconditioned direction rather than to raw momentum, at per-iteration wall-time comparable to Muon. We prove that Muon's polar step is Schur gauge-equivariant on multiplicity matrices while AdamW's coordinatewise step is not. On DeiT-Tiny trained from scratch on four independently sampled 100-class subsets of ImageNet-1k, where multiplicity-basis freedom is trivial, PolarAdamW outperforms Muon by +1.93 pp in test accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
