Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement
Shuchen Zhu, Rizhen Hu, Mingze Wang, Mou Sun, Xue Wang, Kun Yuan, Zaiwen Wen

TL;DR
This paper introduces LITE, a novel acceleration method for large language model pre-training that leverages flat-direction dynamics and Riemannian geometry to improve optimizer efficiency and convergence speed.
Contribution
The paper develops a Riemannian ODE framework for optimizer analysis and proposes LITE, a generalized acceleration strategy that enhances training dynamics along flat directions.
Findings
LITE significantly accelerates training of Muon and SOAP optimizers.
LITE improves convergence across diverse architectures and datasets.
Theoretical analysis confirms faster convergence along flat directions.
Abstract
Pre-training Large Language Models requires immense computational resources, making optimizer efficiency essential. The optimization landscape is highly anisotropic, with loss reduction driven predominantly by progress along flat directions. While matrix-based optimizers such as Muon and SOAP leverage fine-grained curvature information to outperform AdamW, their updates tend toward isotropy -- relatively conservative along flat directions yet potentially aggressive along sharp ones. To address this limitation, we first establish a unified Riemannian Ordinary Differential Equation (ODE) framework that elucidates how common adaptive algorithms operate synergistically: the preconditioner induces a Riemannian geometry that mitigates ill-conditioning, while momentum serves as a Riemannian damping term that promotes convergence. Guided by these insights, we propose LITE, a generalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Machine Learning in Materials Science · Muon and positron interactions and applications
