A Simple Linear Patch Revives Layer-Pruned Large Language Models

Xinrui Chen; Haoli Bai; Tao Yuan; Ruikang Liu; Kang Zhao; Xianzhi Yu; Lu Hou; Tian Guan; Yonghong He; Chun Yuan

arXiv:2505.24680·cs.CL·October 28, 2025

A Simple Linear Patch Revives Layer-Pruned Large Language Models

Xinrui Chen, Haoli Bai, Tao Yuan, Ruikang Liu, Kang Zhao, Xianzhi Yu, Lu Hou, Tian Guan, Yonghong He, Chun Yuan

PDF

Open Access

TL;DR

This paper introduces LinearPatch, a simple yet effective method to improve layer pruning in large language models by addressing activation scale mismatch, significantly preserving model performance.

Contribution

LinearPatch is a lightweight, plug-and-play technique that aligns activation statistics at the pruning interface, substantially reducing performance loss in layer-pruned LLMs.

Findings

01

LinearPatch retains up to 94.15% of original performance after pruning 5 layers.

02

Outperforms previous state-of-the-art by 4% in performance retention.

03

Further refinement with unlabeled data boosts retention to 95.16%.

Abstract

Layer pruning has emerged as a widely used technique for compressing large language models (LLMs). However, existing layer pruning approaches often incur substantial performance degradation. We identify the majority of this degradation to a single yet previously overlooked issue: \textit{the mismatch of activation magnitudes at the pruning interface}. The pre-interface activations exhibit significantly different scales from the post-interface ones, causing the distributional shift as it propagates through the remaining layers. To address this issue, we introduce \textsc{LinearPatch}, a lightweight and plug-and-play technique that fuses two operations into one matrix multiply at the pruning interface: (i) a Hadamard transformation that suppresses massive outliers at particular tokens and (ii) a channel-wise scaling that aligns activation statistics. On LLaMA-3-8B, \textsc{LinearPatch}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsPruning · ALIGN