The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models
Yao Lu, Yuqi Li, Wenbin Xie, Shanqing Yu, Qi Xuan, Zhaowei Zhu, Shiping Wen

TL;DR
This paper introduces CLP, a novel differentiable framework for continuous layer pruning in large language models, effectively reducing model size while maintaining high performance through gradient-based optimization and targeted fine-tuning.
Contribution
The paper proposes a new continuous layer pruning method with a differentiable concave gate and cutoff endpoint tuning, addressing layer dependency issues in LLM pruning.
Findings
CLP outperforms existing methods at various pruning rates.
Achieves 95.34% performance retention at 20% pruning on LLaMA3-70B.
Can be combined with quantization for further compression.
Abstract
Although large language models (LLMs) have achieved revolutionary breakthroughs in many fields, their large model size and high computational cost pose significant challenges for practical deployment on resource-constrained edge devices. To this end, layer pruning has been proposed to reduce the computational overhead by directly removing redundant layers. However, existing layer pruning methods typically rely on hand-crafted metrics to evaluate and remove individual layers, while ignoring the dependencies between layers. This can disrupt the model's information flow and severely degrade performance. To address these issues, we propose CLP, a novel continuous layer pruning framework that introduces two key innovations: a differentiable concave gate algorithm that automatically identifies the best continuous layer segments for pruning via gradient-based optimization; and a cutoff…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
