Revisiting Offline Compression: Going Beyond Factorization-based Methods   for Transformer Language Models

Mohammadreza Banaei; Klaudia Ba{\l}azy; Artur Kasymov; R\'emi Lebret,; Jacek Tabor; Karl Aberer

arXiv:2302.04045·cs.CL·February 9, 2023

Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models

Mohammadreza Banaei, Klaudia Ba{\l}azy, Artur Kasymov, R\'emi Lebret,, Jacek Tabor, Karl Aberer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel autoencoder-based offline compression method for transformer language models, outperforming traditional factorization techniques without requiring additional fine-tuning.

Contribution

Proposes a new autoencoder-based framework for offline model compression that surpasses classical matrix factorization methods in performance.

Findings

01

Autoencoder-based compression outperforms factorization methods

02

Collaborative module compression improves accuracy

03

Significant gains across multiple NLP tasks

Abstract

Recent transformer language models achieve outstanding results in many natural language processing (NLP) tasks. However, their enormous size often makes them impractical on memory-constrained devices, requiring practitioners to compress them to smaller networks. In this paper, we explore offline compression methods, meaning computationally-cheap approaches that do not require further fine-tuning of the compressed model. We challenge the classical matrix factorization methods by proposing a novel, better-performing autoencoder-based framework. We perform a comprehensive ablation study of our approach, examining its different aspects over a diverse set of evaluation settings. Moreover, we show that enabling collaboration between modules across layers by compressing certain modules together positively impacts the final model performance. Experiments on various NLP tasks demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mohammadrezabanaei/auto-encoder-based-transformer-compression
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis