Reconstruct the Pruned Model without Any Retraining

Pingjie Wang; Ziqing Fan; Shengchao Hu; Zhe Chen; Yanfeng Wang; Yu; Wang

arXiv:2407.13331·cs.LG·July 19, 2024

Reconstruct the Pruned Model without Any Retraining

Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu, Wang

PDF

Open Access

TL;DR

This paper introduces LIAR, a retraining-free, linear interpolation-based method for reconstructing pruned large language models, maintaining high accuracy across various benchmarks without additional training.

Contribution

The paper presents LIAR, a novel, generalizable reconstruction framework that works with different pruning criteria and modules without retraining or back-propagation.

Findings

01

Maintains 98% accuracy after 50% pruning on BERT.

02

Achieves top LLaMA performance in minutes.

03

Compatible with various pruning criteria.

Abstract

Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while using reconstruction techniques that are specific to certain modules or criteria, resulting in limited generalizability. To address this, we introduce the Linear Interpolation-based Adaptive Reconstruction (LIAR) framework, which is both efficient and effective. LIAR does not require back-propagation or retraining and is compatible with various pruning criteria and modules. By applying linear interpolation to the preserved weights, LIAR minimizes reconstruction error and effectively reconstructs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Residual Connection · Layer Normalization · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Adam · Dropout · LLaMA