Prompt-based Depth Pruning of Large Language Models

Juyun Wee; Minjae Park; Jaeho Lee

arXiv:2502.04348·cs.CL·June 13, 2025

Prompt-based Depth Pruning of Large Language Models

Juyun Wee, Minjae Park, Jaeho Lee

PDF

Open Access

TL;DR

This paper introduces PuDDing, a prompt-based dynamic depth pruning method for large language models that selectively omits transformer blocks based on input prompts, improving inference efficiency and task performance.

Contribution

The paper proposes a novel dynamic depth pruning algorithm that adapts transformer block removal to specific inputs, outperforming static pruning methods.

Findings

01

PuDDing accelerates inference in language models.

02

It achieves better task performance than static pruning.

03

Effective on commonsense reasoning benchmarks.

Abstract

Depth pruning aims to reduce the inference cost of a large language model without any hardware-specific complications, by simply removing several less important transformer blocks. However, our empirical findings suggest that the importance of a transformer block may be highly task-dependent -- a block that is crucial for a task can be removed without degrading the accuracy on another task. Based on this observation, we develop a dynamic depth pruning algorithm, coined PuDDing (Prompt-routed Dynamic Depth Pruning), which determines which blocks to omit from the model based on the input prompt. PuDDing operates by training a lightweight router to predict the best omission set among a set of options, where this option set has also been constructed in a data-driven manner. Empirical results on commonsense reasoning benchmarks demonstrate that PuDDing effectively accelerates the inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsPruning · Sparse Evolutionary Training