Beyond Manually Designed Pruning Policies with Second-Level Performance Prediction: A Pruning Framework for LLMs

Zuxin Ma; Yunhe Cui; Yongbin Qin

arXiv:2508.02381·cs.LG·August 7, 2025

Beyond Manually Designed Pruning Policies with Second-Level Performance Prediction: A Pruning Framework for LLMs

Zuxin Ma, Yunhe Cui, Yongbin Qin

PDF

TL;DR

This paper introduces PPF, a novel framework for LLM pruning that predicts performance to enable dynamic and static pruning without manual policies, significantly speeding up evaluation and improving model efficiency.

Contribution

PPF eliminates manual pruning policy design by using second-level performance prediction, supporting real-time decisions and dynamic pruning ratios in LLMs.

Findings

01

Reduces perplexity by up to 33.4% (dynamic) and 84.78% (static)

02

Achieves over 64 times speedup in evaluation latency

03

Outperforms existing manual pruning policies

Abstract

Non-uniform structured network pruning methods can effectively reduce Large Language Model (LLM) size by eliminating redundant channels or layers, offering lower performance degradation than uniform strategies. However, existing non-uniform methods rely heavily on manually designed pruning policies (e.g., layer importance and scaling factors), and therefore cannot efficiently adapt to scenarios with dynamic pruning ratio requirements. Additionly, a critical bottleneck -- the time-consuming evaluation of pruning policies -- further limits the feasibility of iteratively and dynamically finding optimal pruning policies. To address these limitations, we propose PPF (Predictive Pruning Framework), a novel pruning framework for LLMs that eliminates manual design dependencies via second-level performance prediction. PPF not only supports real-time pruning decisions under dynamic pruning ratios…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.