Accurate Retraining-free Pruning for Pretrained Encoder-based Language   Models

Seungcheol Park; Hojun Choi; U Kang

arXiv:2308.03449·cs.CL·March 18, 2024

Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models

Seungcheol Park, Hojun Choi, U Kang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces K-prune, a novel retraining-free structured pruning method for pretrained language models that preserves knowledge to significantly improve accuracy at high compression rates.

Contribution

K-prune is a new retraining-free pruning algorithm that maintains model knowledge, reducing accuracy loss during compression of pretrained language models.

Findings

01

Achieves up to 58.02% higher F1 score compared to existing methods.

02

Effectively compresses models by 80% without retraining.

03

Significantly improves accuracy at high compression rates.

Abstract

Given a pretrained encoder-based language model, how can we accurately compress it without retraining? Retraining-free structured pruning algorithms are crucial in pretrained language model compression due to their significantly reduced pruning cost and capability to prune large language models. However, existing retraining-free algorithms encounter severe accuracy degradation, as they fail to handle pruning errors, especially at high compression rates. In this paper, we propose K-prune (Knowledge-preserving pruning), an accurate retraining-free structured pruning algorithm for pretrained encoder-based language models. K-prune focuses on preserving the useful knowledge of the pretrained model to minimize pruning errors through a carefully designed iterative pruning process composed of knowledge measurement, knowledge-preserving mask search, and knowledge-preserving weight-tuning. As a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snudm-starlab/k-prune
pytorchOfficial

Videos

Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

Methodsfail · Pruning