Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation
Xinrui Chen, Hongxing Zhang, Fanyi Zeng, Yongxian Wei, Yizhi Wang, Xitong Ling, Guanghao Li, Chun Yuan

TL;DR
This paper introduces Prune&Comp, a training-free, iterative layer pruning method for large language models that uses magnitude compensation to maintain performance and significantly improve pruning efficiency.
Contribution
It proposes a novel plug-and-play magnitude compensation scheme for layer pruning that mitigates performance loss without additional training, enhancing existing pruning metrics.
Findings
Prune&Comp nearly halves perplexity after pruning 5 layers of LLaMA-3-8B.
It retains 93.19% of original question-answering performance.
Outperforms baseline methods by 4.01% in key metrics.
Abstract
Layer pruning has emerged as a promising technique for compressing large language models (LLMs) while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a significant magnitude gap in hidden states, resulting in substantial performance degradation. To address this issue, we propose Prune&Comp, a novel plug-and-play layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap caused by layer removal and then eliminate this gap by rescaling the remaining weights offline, with zero runtime overhead incurred. We further demonstrate the advantages of Prune&Comp through an iterative pruning strategy. When integrated with an iterative prune-and-compensate loop, Prune&Comp consistently enhances existing layer pruning metrics. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDigital Rights Management and Security · 3D IC and TSV technologies
