Magnitude Pruning of Large Pretrained Transformer Models with a Mixture   Gaussian Prior

Mingxuan Zhang; Yan Sun; and Faming Liang

arXiv:2411.00969·stat.ML·November 5, 2024

Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior

Mingxuan Zhang, Yan Sun, and Faming Liang

PDF

Open Access

TL;DR

This paper introduces a novel magnitude-based pruning method called MGPP that uses a mixture Gaussian prior to effectively prune large pretrained transformer models, maintaining performance across diverse NLP tasks.

Contribution

The paper proposes MGPP, a new pruning algorithm employing a mixture Gaussian prior, which improves model sparsity while preserving expressive power in large NLP transformers.

Findings

01

MGPP outperforms existing pruning methods at high sparsity levels.

02

Extensive NLP task evaluations demonstrate MGPP's superior performance.

03

Theoretical analysis supports the consistency of sparse transformers with MGPP.

Abstract

Large pretrained transformer models have revolutionized modern AI applications with their state-of-the-art performance in natural language processing (NLP). However, their substantial parameter count poses challenges for real-world deployment. To address this, researchers often reduce model size by pruning parameters based on their magnitude or sensitivity. Previous research has demonstrated the limitations of magnitude pruning, especially in the context of transfer learning for modern NLP tasks. In this paper, we introduce a new magnitude-based pruning algorithm called mixture Gaussian prior pruning (MGPP), which employs a mixture Gaussian prior for regularization. MGPP prunes non-expressive weights under the guidance of the mixture Gaussian prior, aiming to retain the model's expressive capability. Extensive evaluations across various NLP tasks, including natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic Properties and Applications · Image and Signal Denoising Methods · Seismic Imaging and Inversion Techniques

MethodsPruning