You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty

TL;DR
PruneNet introduces a calibration-free, policy learning-based model pruning method that efficiently compresses large language models while maintaining high performance, eliminating the need for calibration datasets and reducing performance loss.
Contribution
The paper presents PruneNet, a novel approach that reformulates model pruning as a policy learning task, enabling calibration-free compression with minimal performance degradation.
Findings
Compresses LLaMA-2-7B in 15 minutes with over 80% performance retention.
Achieves 30% model compression while maintaining high zero-shot performance.
Demonstrates robustness on complex language understanding tasks, preserving up to 80% of original performance.
Abstract
The ever-increasing size of large language models (LLMs) presents significant challenges for deployment due to their heavy computational and memory requirements. Current model pruning techniques attempt to alleviate these issues by relying heavily on external calibration datasets to determine which parameters to prune or compress, thus limiting their flexibility and scalability across different compression ratios. Moreover, these methods often cause severe performance degradation, particularly in downstream tasks, when subjected to higher compression rates. In this paper, we propose PruneNet, a novel model compression method that addresses these limitations by reformulating model pruning as a policy learning process. PruneNet decouples the pruning process from the model architecture, eliminating the need for calibration datasets. It learns a stochastic pruning policy to assess parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReservoir Engineering and Simulation Methods
MethodsPruning
