Mano: Restriking Manifold Optimization for LLM Training

Yufei Gu; Zeke Xie

arXiv:2601.23000·cs.LG·February 2, 2026

Mano: Restriking Manifold Optimization for LLM Training

Yufei Gu, Zeke Xie

PDF

Open Access

TL;DR

Mano is a novel manifold optimization-based optimizer for training large language models, outperforming AdamW and Muon in efficiency and effectiveness by leveraging a new tangent space projection and rotational manifold constraints.

Contribution

This paper introduces Mano, the first manifold optimizer that effectively bridges the performance gap with modern optimizers for large-scale LLM training.

Findings

01

Mano outperforms AdamW and Muon on LLaMA and Qwen3 models.

02

Mano reduces memory and computational costs compared to existing optimizers.

03

Experimental results show Mano expands the efficiency Pareto frontier.

Abstract

While large language models (LLMs) have emerged as a significant advancement in artificial intelligence, the hardware and computational costs for training LLMs are also significantly burdensome. Among the state-of-the-art optimizers, AdamW relies on diagonal curvature estimates and ignores structural properties, while Muon applies global spectral normalization at the expense of losing curvature information. In this study, we restriked manifold optimization methods for training LLMs, which may address both optimizers' limitations, while conventional manifold optimization methods have been largely overlooked due to the poor performance in large-scale model optimization. By innovatively projecting the momentum onto the tangent space of model parameters and constraining it on a rotational Oblique manifold, we propose a novel, powerful, and efficient optimizer **Mano** that is the first to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Natural Language Processing Techniques · Big Data and Digital Economy