The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization
Alexey Kravatskiy, Ivan Kozyrev, Nikolai Kozlov, Alexander Vinogradov, Daniil Merkulov, Ivan Oseledets

TL;DR
This paper introduces new matrix norm-based algorithms called Fanions, extending Muon updates with dual Ky Fan norms, and demonstrates their effectiveness in optimizing large language model weights through theoretical analysis and extensive experiments.
Contribution
It develops a family of Muon-like algorithms using duals of Ky Fan norms, expanding the toolkit for matrix optimization in machine learning.
Findings
F-Muon and S-Muon match Muon's performance across tasks.
Fanions outperform vanilla Muon on synthetic linear problems.
Theoretical analysis supports the effectiveness of dual Ky Fan norm-based algorithms.
Abstract
In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm underlying the Muon update, we leverage duals of the Ky Fan -norms to introduce a family of Muon-like algorithms we name Fanions, which are closely related to Dion. By working with duals of convex combinations of the Ky Fan -norms with either the Frobenius norm or the norm, we construct the families of F-Fanions and S-Fanions, respectively. Their most prominent members are F-Muon and S-Muon. We complement our theoretical analysis with an extensive empirical study of these algorithms across a wide range of tasks and settings, demonstrating that F-Muon and S-Muon consistently match Muon's performance, while outperforming vanilla Muon on a synthetic linear least squares problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Big Data and Digital Economy · Machine Learning in Materials Science
