You Only Prune Once: Designing Calibration-Free Model Compression With   Policy Learning

Ayan Sengupta; Siddhant Chaudhary; Tanmoy Chakraborty

arXiv:2501.15296·cs.CL·March 3, 2025

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning

Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty

PDF

Open Access 1 Video

TL;DR

PruneNet introduces a calibration-free, policy learning-based model pruning method that efficiently compresses large language models while maintaining high performance, eliminating the need for calibration datasets and reducing performance loss.

Contribution

The paper presents PruneNet, a novel approach that reformulates model pruning as a policy learning task, enabling calibration-free compression with minimal performance degradation.

Findings

01

Compresses LLaMA-2-7B in 15 minutes with over 80% performance retention.

02

Achieves 30% model compression while maintaining high zero-shot performance.

03

Demonstrates robustness on complex language understanding tasks, preserving up to 80% of original performance.

Abstract

The ever-increasing size of large language models (LLMs) presents significant challenges for deployment due to their heavy computational and memory requirements. Current model pruning techniques attempt to alleviate these issues by relying heavily on external calibration datasets to determine which parameters to prune or compress, thus limiting their flexibility and scalability across different compression ratios. Moreover, these methods often cause severe performance degradation, particularly in downstream tasks, when subjected to higher compression rates. In this paper, we propose PruneNet, a novel model compression method that addresses these limitations by reformulating model pruning as a policy learning process. PruneNet decouples the pruning process from the model architecture, eliminating the need for calibration datasets. It learns a stochastic pruning policy to assess parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning· slideslive

Taxonomy

TopicsReservoir Engineering and Simulation Methods

MethodsPruning