DLP: Dynamic Layerwise Pruning in Large Language Models

Yuli Chen; Bo Cheng; Jiale Han; Yingying Zhang; Yingting Li; Shuhao Zhang

arXiv:2505.23807·cs.CL·June 4, 2025

DLP: Dynamic Layerwise Pruning in Large Language Models

Yuli Chen, Bo Cheng, Jiale Han, Yingying Zhang, Yingting Li, Shuhao Zhang

PDF

1 Repo

TL;DR

The paper introduces DLP, a dynamic layerwise pruning method for large language models that adaptively determines layer importance, significantly reducing parameters while maintaining performance at high sparsity levels.

Contribution

DLP is a novel adaptive pruning approach that considers layer importance dynamically, outperforming static methods and integrating seamlessly with existing compression techniques.

Findings

01

DLP reduces perplexity of LLaMA2-7B by 7.79 at 70% sparsity.

02

DLP improves average accuracy by 2.7% over state-of-the-art methods.

03

DLP is compatible with various LLM compression techniques.

Abstract

Pruning has recently been widely adopted to reduce the parameter scale and improve the inference efficiency of Large Language Models (LLMs). Mainstream pruning techniques often rely on uniform layerwise pruning strategies, which can lead to severe performance degradation at high sparsity levels. Recognizing the varying contributions of different layers in LLMs, recent studies have shifted their focus toward non-uniform layerwise pruning. However, these approaches often rely on pre-defined values, which can result in suboptimal performance. To overcome these limitations, we propose a novel method called Dynamic Layerwise Pruning (DLP). This approach adaptively determines the relative importance of each layer by integrating model weights with input activation information, assigning pruning rates accordingly. Experimental results show that DLP effectively preserves model performance at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ironartisan/dlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Focus