CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang, Minghua Ma, Zekun Wang, Jingchang Chen, Huiming Fan,, Liping Shan, Qing Yang, Dongliang Xu, Ming Liu, Bing Qin

TL;DR
CFSP is a novel structured pruning framework for Large Language Models that efficiently leverages activation information at multiple granularities to achieve high sparsity with minimal performance loss.
Contribution
The paper introduces CFSP, a structured pruning method that uses coarse-to-fine activation importance, enabling efficient pruning with a single forward pass and adaptive fine-tuning.
Findings
CFSP outperforms existing pruning methods across various models and sparsity levels.
The framework achieves high sparsity with minimal accuracy degradation.
Experimental results validate the efficiency and effectiveness of CFSP.
Abstract
The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical speed-up. In contrast, structured pruning can reduce latency on general devices. However, it remains a challenge to perform structured pruning efficiently and maintain performance, especially at high sparsity ratios. To this end, we introduce an efficient structured pruning framework named CFSP, which leverages both Coarse (interblock) and Fine-grained (intrablock) activation information as an importance criterion to guide pruning. The pruning is highly efficient, as it only requires one forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsPruning · Focus
