Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park, Hojun Choi, U Kang

TL;DR
This paper introduces K-prune, a novel retraining-free structured pruning method for pretrained language models that preserves knowledge to significantly improve accuracy at high compression rates.
Contribution
K-prune is a new retraining-free pruning algorithm that maintains model knowledge, reducing accuracy loss during compression of pretrained language models.
Findings
Achieves up to 58.02% higher F1 score compared to existing methods.
Effectively compresses models by 80% without retraining.
Significantly improves accuracy at high compression rates.
Abstract
Given a pretrained encoder-based language model, how can we accurately compress it without retraining? Retraining-free structured pruning algorithms are crucial in pretrained language model compression due to their significantly reduced pruning cost and capability to prune large language models. However, existing retraining-free algorithms encounter severe accuracy degradation, as they fail to handle pruning errors, especially at high compression rates. In this paper, we propose K-prune (Knowledge-preserving pruning), an accurate retraining-free structured pruning algorithm for pretrained encoder-based language models. K-prune focuses on preserving the useful knowledge of the pretrained model to minimize pruning errors through a carefully designed iterative pruning process composed of knowledge measurement, knowledge-preserving mask search, and knowledge-preserving weight-tuning. As a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
Methodsfail · Pruning
