Beyond Manually Designed Pruning Policies with Second-Level Performance Prediction: A Pruning Framework for LLMs
Zuxin Ma, Yunhe Cui, Yongbin Qin

TL;DR
This paper introduces PPF, a novel framework for LLM pruning that predicts performance to enable dynamic and static pruning without manual policies, significantly speeding up evaluation and improving model efficiency.
Contribution
PPF eliminates manual pruning policy design by using second-level performance prediction, supporting real-time decisions and dynamic pruning ratios in LLMs.
Findings
Reduces perplexity by up to 33.4% (dynamic) and 84.78% (static)
Achieves over 64 times speedup in evaluation latency
Outperforms existing manual pruning policies
Abstract
Non-uniform structured network pruning methods can effectively reduce Large Language Model (LLM) size by eliminating redundant channels or layers, offering lower performance degradation than uniform strategies. However, existing non-uniform methods rely heavily on manually designed pruning policies (e.g., layer importance and scaling factors), and therefore cannot efficiently adapt to scenarios with dynamic pruning ratio requirements. Additionly, a critical bottleneck -- the time-consuming evaluation of pruning policies -- further limits the feasibility of iteratively and dynamically finding optimal pruning policies. To address these limitations, we propose PPF (Predictive Pruning Framework), a novel pruning framework for LLMs that eliminates manual design dependencies via second-level performance prediction. PPF not only supports real-time pruning decisions under dynamic pruning ratios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
