From Dense to Sparse: Contrastive Pruning for Better Pre-trained   Language Model Compression

Runxin Xu; Fuli Luo; Chengyu Wang; Baobao Chang; Jun Huang; Songfang; Huang; Fei Huang

arXiv:2112.07198·cs.CL·December 15, 2021

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Runxin Xu, Fuli Luo, Chengyu Wang, Baobao Chang, Jun Huang, Songfang, Huang, Fei Huang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces ContrAstive Pruning (CAP), a novel framework for compressing pre-trained language models by preserving both task-agnostic and task-specific knowledge through contrastive learning, leading to high sparsity with minimal performance loss.

Contribution

CAP is a general pruning framework that effectively maintains knowledge during compression by leveraging contrastive learning and model snapshots, outperforming prior methods especially at high sparsity levels.

Findings

01

CAP achieves 99.2% of BERT's performance with only 3% parameters in QQP.

02

CAP outperforms existing pruning methods at high sparsity levels.

03

Pruned models by CAP show improved generalization ability.

Abstract

Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

From Dense to Sparse: Contrastive Pruning for Better Pre-Trained Language Model Compression· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Pruning · Linear Layer · Adam · Multi-Head Attention · Residual Connection · Layer Normalization · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections