Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation

Xinrui Chen; Hongxing Zhang; Fanyi Zeng; Yongxian Wei; Yizhi Wang; Xitong Ling; Guanghao Li; Chun Yuan

arXiv:2507.18212·cs.CL·July 25, 2025

Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation

Xinrui Chen, Hongxing Zhang, Fanyi Zeng, Yongxian Wei, Yizhi Wang, Xitong Ling, Guanghao Li, Chun Yuan

PDF

Open Access 1 Video

TL;DR

This paper introduces Prune&Comp, a training-free, iterative layer pruning method for large language models that uses magnitude compensation to maintain performance and significantly improve pruning efficiency.

Contribution

It proposes a novel plug-and-play magnitude compensation scheme for layer pruning that mitigates performance loss without additional training, enhancing existing pruning metrics.

Findings

01

Prune&Comp nearly halves perplexity after pruning 5 layers of LLaMA-3-8B.

02

It retains 93.19% of original question-answering performance.

03

Outperforms baseline methods by 4.01% in key metrics.

Abstract

Layer pruning has emerged as a promising technique for compressing large language models (LLMs) while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a significant magnitude gap in hidden states, resulting in substantial performance degradation. To address this issue, we propose Prune&Comp, a novel plug-and-play layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap caused by layer removal and then eliminate this gap by rescaling the remaining weights offline, with zero runtime overhead incurred. We further demonstrate the advantages of Prune&Comp through an iterative pruning strategy. When integrated with an iterative prune-and-compensate loop, Prune&Comp consistently enhances existing layer pruning metrics. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation· underline

Taxonomy

TopicsDigital Rights Management and Security · 3D IC and TSV technologies