LionMuon: Alternating Spectral and Sign Descent for Efficient Training

Arman Bolatov; Artem Riabinin; Nikita Kornilov; Andrey Veprikov; Samuel Horv\'ath; Martin Tak\'a\v{c}; Aleksandr Beznosikov

arXiv:2605.19811·cs.LG·May 20, 2026

LionMuon: Alternating Spectral and Sign Descent for Efficient Training

Arman Bolatov, Artem Riabinin, Nikita Kornilov, Andrey Veprikov, Samuel Horv\'ath, Martin Tak\'a\v{c}, Aleksandr Beznosikov

PDF

1 Repo

TL;DR

LionMuon is a novel optimizer that alternates between spectral and sign descent methods, achieving high efficiency and superior performance across multiple large-scale datasets and models.

Contribution

It introduces LionMuon, an optimizer combining spectral and sign-based updates with reduced memory and computational costs, outperforming existing methods.

Findings

01

LionMuon outperforms Muon, Lion, Signum, and AdamW on all tested datasets and architectures.

02

LionMuon achieves lower validation loss at reduced compute cost.

03

Theoretical analysis provides sharp complexity bounds predicting optimal periods and conditions for superiority.

Abstract

In large-scale optimization, the cheapness and effectiveness of update steps are the most crucial factors for a successful optimizer. Sign-based optimizers like Lion or Signum produce cheap per-step updates, whereas Muon's spectral matrix-sign update gives a much stronger direction at a substantially higher per-step cost. In this work, we propose LionMuon, which retains the effectiveness of Muon steps while considerably cutting the averaged iteration cost, similar to sign-based methods. It alternates between Lion's and Muon's updates on a fixed period P, sharing a single dual-EMA momentum buffer between them. The optimizer state memory therefore matches Lion and is exactly half of AdamW's. A simpler single-EMA variant, SignMuon, by itself already outperforms pure Muon. At P = 2, LionMuon Pareto-dominates Muon, Lion, Signum, and AdamW on every dataset and architecture we tested at 124M…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brain-lab-research/lion-muon
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.