Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models
Mohammadreza Banaei, Klaudia Ba{\l}azy, Artur Kasymov, R\'emi Lebret,, Jacek Tabor, Karl Aberer

TL;DR
This paper introduces a novel autoencoder-based offline compression method for transformer language models, outperforming traditional factorization techniques without requiring additional fine-tuning.
Contribution
Proposes a new autoencoder-based framework for offline model compression that surpasses classical matrix factorization methods in performance.
Findings
Autoencoder-based compression outperforms factorization methods
Collaborative module compression improves accuracy
Significant gains across multiple NLP tasks
Abstract
Recent transformer language models achieve outstanding results in many natural language processing (NLP) tasks. However, their enormous size often makes them impractical on memory-constrained devices, requiring practitioners to compress them to smaller networks. In this paper, we explore offline compression methods, meaning computationally-cheap approaches that do not require further fine-tuning of the compressed model. We challenge the classical matrix factorization methods by proposing a novel, better-performing autoencoder-based framework. We perform a comprehensive ablation study of our approach, examining its different aspects over a diverse set of evaluation settings. Moreover, we show that enabling collaboration between modules across layers by compressing certain modules together positively impacts the final model performance. Experiments on various NLP tasks demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
