Movement Pruning: Adaptive Sparsity by Fine-Tuning

Victor Sanh; Thomas Wolf; Alexander M. Rush

arXiv:2005.07683·cs.CL·October 26, 2020·33 cites

Movement Pruning: Adaptive Sparsity by Fine-Tuning

Victor Sanh, Thomas Wolf, Alexander M. Rush

PDF

Open Access 4 Repos 5 Models 2 Videos

TL;DR

This paper introduces movement pruning, an adaptive fine-tuning method for large pretrained models that significantly reduces parameters with minimal accuracy loss, outperforming traditional magnitude pruning especially at high sparsity levels.

Contribution

The paper presents movement pruning, a novel first-order weight pruning technique tailored for transfer learning, with mathematical foundations and superior performance over existing methods.

Findings

01

Movement pruning outperforms magnitude pruning at high sparsity levels.

02

Combining movement pruning with distillation maintains accuracy with only 3% of parameters.

03

The method is mathematically grounded and effective for large pretrained language models.

Abstract

Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)· youtube

Movement Pruning: Adaptive Sparsity by Fine-Tuning· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis

MethodsPruning · Movement Pruning