Mano: Restriking Manifold Optimization for LLM Training
Yufei Gu, Zeke Xie

TL;DR
Mano is a novel manifold optimization-based optimizer for training large language models, outperforming AdamW and Muon in efficiency and effectiveness by leveraging a new tangent space projection and rotational manifold constraints.
Contribution
This paper introduces Mano, the first manifold optimizer that effectively bridges the performance gap with modern optimizers for large-scale LLM training.
Findings
Mano outperforms AdamW and Muon on LLaMA and Qwen3 models.
Mano reduces memory and computational costs compared to existing optimizers.
Experimental results show Mano expands the efficiency Pareto frontier.
Abstract
While large language models (LLMs) have emerged as a significant advancement in artificial intelligence, the hardware and computational costs for training LLMs are also significantly burdensome. Among the state-of-the-art optimizers, AdamW relies on diagonal curvature estimates and ignores structural properties, while Muon applies global spectral normalization at the expense of losing curvature information. In this study, we restriked manifold optimization methods for training LLMs, which may address both optimizers' limitations, while conventional manifold optimization methods have been largely overlooked due to the poor performance in large-scale model optimization. By innovatively projecting the momentum onto the tangent space of model parameters and constraining it on a rotational Oblique manifold, we propose a novel, powerful, and efficient optimizer **Mano** that is the first to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Natural Language Processing Techniques · Big Data and Digital Economy
