Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm
Dongkuan Xu, Ian E.H. Yen, Jinxi Zhao, Zhibin Xiao

TL;DR
This paper introduces a knowledge-aware sparse pruning method for BERT that significantly improves compression rates while maintaining accuracy, addressing the gap between CNN and transformer pruning results.
Contribution
It proposes a novel knowledge-aware pruning technique that surpasses existing methods in compressing BERT models without accuracy loss.
Findings
Achieves 20x compression in weights and FLOPs.
Outperforms existing pruning methods on GLUE benchmark.
Maintains prediction accuracy despite high compression.
Abstract
Transformer-based pre-trained language models have significantly improved the performance of various natural language processing (NLP) tasks in the recent years. While effective and prevalent, these models are usually prohibitively large for resource-limited deployment scenarios. A thread of research has thus been working on applying network pruning techniques under the pretrain-then-finetune paradigm widely adopted in NLP. However, the existing pruning results on benchmark transformers, such as BERT, are not as remarkable as the pruning results in the literature of convolutional neural networks (CNNs). In particular, common wisdom in pruning CNN states that sparse pruning technique compresses a model more than that obtained by reducing number of channels and layers (Elsen et al., 2020; Zhu and Gupta, 2017), while existing works on sparse pruning of BERT yields inferior results than its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Pruning · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Softmax · Linear Warmup With Linear Decay · WordPiece
