Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh, Thomas Wolf, Alexander M. Rush

TL;DR
This paper introduces movement pruning, an adaptive fine-tuning method for large pretrained models that significantly reduces parameters with minimal accuracy loss, outperforming traditional magnitude pruning especially at high sparsity levels.
Contribution
The paper presents movement pruning, a novel first-order weight pruning technique tailored for transfer learning, with mathematical foundations and superior performance over existing methods.
Findings
Movement pruning outperforms magnitude pruning at high sparsity levels.
Combining movement pruning with distillation maintains accuracy with only 3% of parameters.
The method is mathematically grounded and effective for large pretrained language models.
Abstract
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗madlag/bert-base-uncased-squad-v1-sparse0.25model· 3 dl3 dl
- 🤗madlag/bert-base-uncased-squad1.1-block-sparse-0.07-v1model· 640 dl640 dl
- 🤗madlag/bert-base-uncased-squad1.1-block-sparse-0.13-v1model· 2 dl2 dl
- 🤗madlag/bert-base-uncased-squad1.1-block-sparse-0.20-v1model· 5 dl5 dl
- 🤗madlag/bert-base-uncased-squad1.1-block-sparse-0.32-v1model· 5 dl5 dl
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis
MethodsPruning · Movement Pruning
