Rethinking Pruning Large Language Models: Benefits and Pitfalls of   Reconstruction Error Minimization

Sungbin Shin; Wonpyo Park; Jaeho Lee; Namhoon Lee

arXiv:2406.15524·cs.CL·October 14, 2024

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reevaluates LLM pruning by introducing advanced reconstruction techniques to reduce errors, while highlighting the risks of overfitting and proposing self-generated calibration data as a mitigation strategy.

Contribution

It presents new reconstruction methods that significantly lower errors and uncovers the pitfalls of error minimization, proposing self-generated data to balance reconstruction and generalization.

Findings

01

Reconstruction error can be reduced by over 90%.

02

Minimizing reconstruction error may cause overfitting and degrade performance.

03

Self-generating calibration data helps mitigate overfitting issues.

Abstract

This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90%$ . Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

log-postech/rethinking-llm-pruning
pytorchOfficial

Videos

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsPruning