Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm

Dongkuan Xu; Ian E.H. Yen; Jinxi Zhao; Zhibin Xiao

arXiv:2104.08682·cs.CL·January 19, 2022·6 cites

Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm

Dongkuan Xu, Ian E.H. Yen, Jinxi Zhao, Zhibin Xiao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a knowledge-aware sparse pruning method for BERT that significantly improves compression rates while maintaining accuracy, addressing the gap between CNN and transformer pruning results.

Contribution

It proposes a novel knowledge-aware pruning technique that surpasses existing methods in compressing BERT models without accuracy loss.

Findings

01

Achieves 20x compression in weights and FLOPs.

02

Outperforms existing pruning methods on GLUE benchmark.

03

Maintains prediction accuracy despite high compression.

Abstract

Transformer-based pre-trained language models have significantly improved the performance of various natural language processing (NLP) tasks in the recent years. While effective and prevalent, these models are usually prohibitively large for resource-limited deployment scenarios. A thread of research has thus been working on applying network pruning techniques under the pretrain-then-finetune paradigm widely adopted in NLP. However, the existing pruning results on benchmark transformers, such as BERT, are not as remarkable as the pruning results in the literature of convolutional neural networks (CNNs). In particular, common wisdom in pruning CNN states that sparse pruning technique compresses a model more than that obtained by reducing number of channels and layers (Elsen et al., 2020; Zhu and Gupta, 2017), while existing works on sparse pruning of BERT yields inferior results than its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

derronxu/sparsebert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Pruning · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Softmax · Linear Warmup With Linear Decay · WordPiece