Reparameterized LLM Training via Orthogonal Equivalence Transformation

Zeju Qiu; Simon Buchholz; Tim Z. Xiao; Maximilian Dax; Bernhard Sch\"olkopf; Weiyang Liu

arXiv:2506.08001·cs.LG·December 12, 2025

Reparameterized LLM Training via Orthogonal Equivalence Transformation

Zeju Qiu, Simon Buchholz, Tim Z. Xiao, Maximilian Dax, Bernhard Sch\"olkopf, Weiyang Liu

PDF

Open Access

TL;DR

This paper introduces POET, a novel reparameterization method for training large language models that uses orthogonal transformations to improve stability and generalization, demonstrating effectiveness and scalability in experiments.

Contribution

The paper proposes POET, a new reparameterization algorithm using orthogonal equivalence transformation for stable and scalable LLM training.

Findings

01

POET improves training stability and generalization in LLMs.

02

POET is scalable to large neural networks.

03

Experimental results validate POET's effectiveness.

Abstract

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks