Nora: Normalized Orthogonal Row Alignment for Scalable Matrix Optimizer

Jinghui Yuan; Jiaxuan Zou; Shuo Wang; Yong Liu; Feiping Nie

arXiv:2605.03769·cs.LG·May 6, 2026

Nora: Normalized Orthogonal Row Alignment for Scalable Matrix Optimizer

Jinghui Yuan, Jiaxuan Zou, Shuo Wang, Yong Liu, Feiping Nie

PDF

TL;DR

Nora is a scalable, efficient optimizer for training large language models that unifies stability, speed, and preconditioning by orthogonal row alignment and norm stabilization.

Contribution

Nora introduces a novel optimizer that combines stability, efficiency, and structured preconditioning with a simple implementation and theoretical scalability guarantees.

Findings

01

Nora achieves stability by stabilizing weight norms and angular velocities.

02

Nora approximates structured preconditioning with linear computational complexity.

03

Preliminary experiments show Nora is effective for large-scale training.

Abstract

Matrix-based optimizers have demonstrated immense potential in training Large Language Models (LLMs), however, designing an ideal optimizer remains a formidable challenge. A superior optimizer must satisfy three core desiderata: efficiency, achieving Muon-like preconditioning to accelerate optimization; stability, strictly adhering to the scale-invariance inherent in neural networks; and speed, minimizing computational overhead. While existing methods address these aspects to varying degrees, they often fail to unify them, either incurring prohibitive computational costs like Muon, or allowing radial jitters that compromise stability like RMNP. To bridge this gap, we propose Nora, an optimizer that rigorously satisfies all three requirements. Nora achieves training stability by explicitly stabilizing weight norms and angular velocities through row-wise momentum projection onto the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.