Demystifying Manifold Constraints in LLM Pre-training

Kang An; Jiaxiang Li; Donald Goldfarb; Shiqian Ma

arXiv:2605.04418·cs.LG·May 7, 2026

Demystifying Manifold Constraints in LLM Pre-training

Kang An, Jiaxiang Li, Donald Goldfarb, Shiqian Ma

PDF

TL;DR

This paper clarifies how explicit manifold constraints in LLM pre-training stabilize training and improve performance, using a novel Riemannian optimizer called MACRO.

Contribution

It introduces MACRO, a provably convergent Riemannian optimizer that disentangles manifold constraints from heuristic normalization techniques in LLM training.

Findings

01

Manifold constraints independently stabilize activation scales.

02

MACRO achieves competitive performance with theoretical guarantees.

03

Constraints enforce stable rotational equilibrium in weights.

Abstract

The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may improve numerical stability and performance, the mechanism and motivation for adding constraints still remain elusive. This paper systematically demystifies the role of explicit manifold constraints in LLM pre-training. By introducing the Msign-Aligned Constrained Riemannian Optimizer (MACRO)-a provably convergent, single-loop optimization framework-our study disentangles weight regularization heuristics from interacting mechanisms like RMS normalization and decoupled weight decay. Theoretical analyses and comprehensive empirical evaluations reveal that manifold constraints independently bound forward activation scales and enforce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.