Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee

TL;DR
This paper reevaluates LLM pruning by introducing advanced reconstruction techniques to reduce errors, while highlighting the risks of overfitting and proposing self-generated calibration data as a mitigation strategy.
Contribution
It presents new reconstruction methods that significantly lower errors and uncovers the pitfalls of error minimization, proposing self-generated data to balance reconstruction and generalization.
Findings
Reconstruction error can be reduced by over 90%.
Minimizing reconstruction error may cause overfitting and degrade performance.
Self-generating calibration data helps mitigate overfitting issues.
Abstract
This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than . Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsPruning
