On the Compression of Natural Language Models

Saeed Damadi

arXiv:2112.11480·cs.CL·December 23, 2021

On the Compression of Natural Language Models

Saeed Damadi

PDF

Open Access

TL;DR

This paper investigates the existence of sparse, trainable subnetworks within large natural language models, reviewing current compression techniques like quantization, distillation, and pruning to improve interpretability and efficiency.

Contribution

It assesses the applicability of the lottery ticket hypothesis to natural language models, exploring whether sparse subnetworks can be trained to match full model performance.

Findings

01

Sparse subnetworks can potentially be found in NLMs

02

Compression techniques improve model efficiency

03

Interpretability of NLMs can be enhanced

Abstract

Deep neural networks are effective feature extractors but they are prohibitively large for deployment scenarios. Due to the huge number of parameters, interpretability of parameters in different layers is not straight-forward. This is why neural networks are sometimes considered black boxes. Although simpler models are easier to explain, finding them is not easy. If found, a sparse network that can fit to a data from scratch would help to interpret parameters of a neural network. To this end, lottery ticket hypothesis states that typical dense neural networks contain a small sparse sub-network that can be trained to a reach similar test accuracy in an equal number of steps. The goal of this work is to assess whether such a trainable subnetwork exists for natural language models (NLM)s. To achieve this goal we will review state-of-the-art compression techniques such as quantization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Machine Learning and Data Classification