FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
Xin Yuan, Siqi Li, Jiateng Wei, Chengrui Zhu, Yanming Wu, Qingpeng Li, Jiajun Lv, Xiaoke Lan, Jun Chen, Yong Liu

TL;DR
FastForward Pruning introduces a single-step reinforcement learning framework for efficient, high-quality pruning of large language models, significantly reducing computational costs while outperforming heuristic methods.
Contribution
The paper presents a decoupled, curriculum-based RL approach that efficiently searches for optimal sparsity policies in large language models, improving over existing methods in both performance and efficiency.
Findings
Achieves superior pruning performance on LLaMA, Mistral, and OPT models.
Reduces computational cost compared to other search-based algorithms.
Outperforms heuristic baselines in model compression tasks.
Abstract
Pruning is an effective method for compressing Large Language Models, but finding an optimal, non-uniform layer-wise sparsity allocation remains a key challenge. While heuristic methods are fast but yield suboptimal performance, more powerful search-based approaches like Reinforcement Learning are often hindered by prohibitive computational costs on large-scale models. To overcome this efficiency barrier, we propose FastForward Pruning. Its core is a decoupled, single-step RL framework that separates policy optimization from the complex budget satisfaction problem. Such a decoupling is crucial for efficiently searching the vast policy space of LLMs. This curriculum-based strategy begins with low-cost, simple tasks and gradually increases in complexity, significantly reducing the search's computational overhead. Evaluated on the LLaMA, Mistral, and OPT model families, our framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
