Reconstruct the Pruned Model without Any Retraining
Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu, Wang

TL;DR
This paper introduces LIAR, a retraining-free, linear interpolation-based method for reconstructing pruned large language models, maintaining high accuracy across various benchmarks without additional training.
Contribution
The paper presents LIAR, a novel, generalizable reconstruction framework that works with different pruning criteria and modules without retraining or back-propagation.
Findings
Maintains 98% accuracy after 50% pruning on BERT.
Achieves top LLaMA performance in minutes.
Compatible with various pruning criteria.
Abstract
Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while using reconstruction techniques that are specific to certain modules or criteria, resulting in limited generalizability. To address this, we introduce the Linear Interpolation-based Adaptive Reconstruction (LIAR) framework, which is both efficient and effective. LIAR does not require back-propagation or retraining and is compatible with various pruning criteria and modules. By applying linear interpolation to the preserved weights, LIAR minimizes reconstruction error and effectively reconstructs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Residual Connection · Layer Normalization · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Adam · Dropout · LLaMA
