Compression of Deep Learning Models for Text: A Survey

Manish Gupta; Puneet Agrawal

arXiv:2008.05221·cs.CL·June 15, 2021

Compression of Deep Learning Models for Text: A Survey

Manish Gupta, Puneet Agrawal

PDF

TL;DR

This survey reviews various methods for compressing large deep learning models in NLP, such as pruning and quantization, to facilitate their deployment in real-world applications with limited resources.

Contribution

It systematically categorizes and summarizes recent advances in NLP model compression techniques, providing a coherent overview for researchers and practitioners.

Findings

01

Six main compression methods identified and explained.

02

Comprehensive organization of recent NLP model compression research.

03

Highlights the importance of model efficiency for industry deployment.

Abstract

In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanksto deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like Bidirectional Encoder Representations from Transformers (BERT) [24], GenerativePre-training Transformer (GPT-2) [94], Multi-task Deep Neural Network (MT-DNN) [73], Extra-Long Network (XLNet) [134], Text-to-text transfer transformer (T5) [95], T-NLG [98] and GShard [63]. But these models are humongous in size. On the other hand,real world applications demand small model size, low response times and low computational power wattage. In this survey, wediscuss six different types of methods (Pruning, Quantization, Knowledge Distillation, Parameter Sharing, Tensor Decomposition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Knowledge Distillation · Absolute Position Encodings · Position-Wise Feed-Forward Layer · GShard · Layer Normalization · Adam · Attention Is All You Need · Multi-Head Attention · Byte Pair Encoding