Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior
Mingxuan Zhang, Yan Sun, and Faming Liang

TL;DR
This paper introduces a novel magnitude-based pruning method called MGPP that uses a mixture Gaussian prior to effectively prune large pretrained transformer models, maintaining performance across diverse NLP tasks.
Contribution
The paper proposes MGPP, a new pruning algorithm employing a mixture Gaussian prior, which improves model sparsity while preserving expressive power in large NLP transformers.
Findings
MGPP outperforms existing pruning methods at high sparsity levels.
Extensive NLP task evaluations demonstrate MGPP's superior performance.
Theoretical analysis supports the consistency of sparse transformers with MGPP.
Abstract
Large pretrained transformer models have revolutionized modern AI applications with their state-of-the-art performance in natural language processing (NLP). However, their substantial parameter count poses challenges for real-world deployment. To address this, researchers often reduce model size by pruning parameters based on their magnitude or sensitivity. Previous research has demonstrated the limitations of magnitude pruning, especially in the context of transfer learning for modern NLP tasks. In this paper, we introduce a new magnitude-based pruning algorithm called mixture Gaussian prior pruning (MGPP), which employs a mixture Gaussian prior for regularization. MGPP prunes non-expressive weights under the guidance of the mixture Gaussian prior, aiming to retain the model's expressive capability. Extensive evaluations across various NLP tasks, including natural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic Properties and Applications · Image and Signal Denoising Methods · Seismic Imaging and Inversion Techniques
MethodsPruning
