Reparameterized LLM Training via Orthogonal Equivalence Transformation
Zeju Qiu, Simon Buchholz, Tim Z. Xiao, Maximilian Dax, Bernhard Sch\"olkopf, Weiyang Liu

TL;DR
This paper introduces POET, a novel reparameterization method for training large language models that uses orthogonal transformations to improve stability and generalization, demonstrating effectiveness and scalability in experiments.
Contribution
The paper proposes POET, a new reparameterization algorithm using orthogonal equivalence transformation for stable and scalable LLM training.
Findings
POET improves training stability and generalization in LLMs.
POET is scalable to large neural networks.
Experimental results validate POET's effectiveness.
Abstract
While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks
