Projected Compression: Trainable Projection for Efficient Transformer Compression
Maciej Stefaniak, Micha{\l} Krutul, Jan Ma{\l}a\'snicki, Maciej Pi\'oro, Jakub Krajewski, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jan Ludziejewski

TL;DR
Projected Compression is a novel trainable projection method that reduces transformer model size while maintaining computational efficiency and outperforming traditional pruning techniques.
Contribution
Introduces a trainable projection-based compression technique that preserves model performance without extra computational overhead.
Findings
Outperforms hard pruning and retraining methods on high-quality models.
Maintains per-token FLOPs comparable to the original model.
Performance scales positively with token count.
Abstract
Large language models have steadily increased in size to achieve improved performance; however, this growth has also led to greater inference time and computational demands. Consequently, there is rising interest in model size reduction methods. To address this issue, we propose Projected Compression, a novel model compression technique, that reduces model weights by utilizing projection modules. Specifically, we first train additional trainable projections weights and preserve access to all the original model parameters. Subsequently, these projections are merged into a lower-dimensional product matrix, resulting in a reduced-size standard Transformer-based model. Unlike alternative approaches that require additional computational overhead, our method matches the base model's per-token computation step in FLOPs. Experimental results show that Projected Compression outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis · Topic Modeling
