Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads

Zhengyan Zhang; Fanchao Qi; Zhiyuan Liu; Qun Liu; Maosong Sun

arXiv:2011.03770·cs.CL·November 10, 2020·5 cites

Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads

Zhengyan Zhang, Fanchao Qi, Zhiyuan Liu, Qun Liu, Maosong Sun

PDF

Open Access

TL;DR

This paper introduces Single-Shot Meta-Pruning, a method to efficiently prune attention heads in pre-trained Transformer models, reducing computational costs while maintaining or improving task performance.

Contribution

It presents a novel meta-learning based approach for adaptive, single-shot pruning of attention heads in Transformers before fine-tuning, enhancing efficiency and effectiveness.

Findings

01

Prunes 50% of attention heads with minimal performance loss

02

Reduces both fine-tuning and inference overheads

03

Improves quality of text representations

Abstract

Deep pre-trained Transformer models have achieved state-of-the-art results over a variety of natural language processing (NLP) tasks. By learning rich language knowledge with millions of parameters, these models are usually overparameterized and significantly increase the computational overhead in applications. It is intuitive to address this issue by model compression. In this work, we propose a method, called Single-Shot Meta-Pruning, to compress deep pre-trained Transformers before fine-tuning. Specifically, we focus on pruning unnecessary attention heads adaptively for different downstream tasks. To measure the informativeness of attention heads, we train our Single-Shot Meta-Pruner (SMP) with a meta-learning paradigm aiming to maintain the distribution of text representations after pruning. Compared with existing compression methods for pre-trained models, our method can reduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsPruning · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Attention Is All You Need · Byte Pair Encoding · Dropout · Softmax