Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
Chen Fan, Mark Schmidt, Christos Thrampoulidis

TL;DR
This paper characterizes the implicit bias of spectral descent and Muon algorithms in multi-class linear classification, showing they converge to max-margin solutions with respect to various matrix norms, including spectral and max-norm.
Contribution
It provides the first complete theoretical analysis of the implicit bias of p-norm normalized steepest descent and momentum methods in multi-class classification, unifying several special cases.
Findings
Algorithms converge to max-margin solutions with respect to the p-norms.
Spectral Descent and Muon specifically converge to spectral norm max-margin solutions.
Adam converges to the max-norm solution with preconditioning.
Abstract
Different gradient-based methods for optimizing overparameterized models can all achieve zero training error yet converge to distinctly different solutions inducing different generalization properties. We provide the first complete characterization of implicit optimization bias for p-norm normalized steepest descent (NSD) and momentum steepest descent (NMD) algorithms in multi-class linear classification with cross-entropy loss. Our key theoretical contribution is proving that these algorithms converge to solutions maximizing the margin with respect to the classifier matrix's p-norm, with established convergence rates. These results encompass important special cases including Spectral Descent and Muon, which we show converge to max-margin solutions with respect to the spectral norm. A key insight of our contribution is that the analysis of general entry-wise and Schatten p-norms can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace and Expression Recognition · Customer churn and segmentation · Imbalanced Data Classification Techniques
MethodsAdam
