ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations

Ekaterina Grishina; Mikhail Gorbunov; Maxim Rakhuba

arXiv:2506.02818·cs.CL·June 4, 2025

ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations

Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba

PDF

Open Access

TL;DR

ProcrustesGPT introduces a method to compress large language models by applying orthogonal transformations to structured matrices, enhancing their compressibility without extensive fine-tuning, thus reducing resource requirements.

Contribution

The paper proposes leveraging orthogonal transformations to improve the compressibility of LLM weights within structured matrix classes, enabling more efficient model compression.

Findings

01

Orthogonal transformations significantly improve structured matrix representation accuracy.

02

The approach reduces model size while maintaining performance.

03

Applicable to various structured matrix types supporting efficient projections.

Abstract

Large language models (LLMs) demonstrate impressive results in natural language processing tasks but require a significant amount of computational and memory resources. Structured matrix representations are a promising way for reducing the number of parameters of these models. However, it seems unrealistic to expect that weight matrices of pretrained models can be accurately represented by structured matrices without any fine-tuning. To overcome this issue, we utilize the fact that LLM output is invariant under certain orthogonal transformations of weight matrices. This insight can be leveraged to identify transformations that significantly improve the compressibility of weights within structured classes. The proposed approach is applicable to various types of structured matrices that support efficient projection operations. Code is available at https://github.com/GrishKate/ProcrustesGPT

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProteoglycans and glycosaminoglycans research · Dendrimers and Hyperbranched Polymers