FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

Xin Yuan; Siqi Li; Jiateng Wei; Chengrui Zhu; Yanming Wu; Qingpeng Li; Jiajun Lv; Xiaoke Lan; Jun Chen; Yong Liu

arXiv:2511.18977·cs.LG·November 25, 2025

FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

Xin Yuan, Siqi Li, Jiateng Wei, Chengrui Zhu, Yanming Wu, Qingpeng Li, Jiajun Lv, Xiaoke Lan, Jun Chen, Yong Liu

PDF

Open Access

TL;DR

FastForward Pruning introduces a single-step reinforcement learning framework for efficient, high-quality pruning of large language models, significantly reducing computational costs while outperforming heuristic methods.

Contribution

The paper presents a decoupled, curriculum-based RL approach that efficiently searches for optimal sparsity policies in large language models, improving over existing methods in both performance and efficiency.

Findings

01

Achieves superior pruning performance on LLaMA, Mistral, and OPT models.

02

Reduces computational cost compared to other search-based algorithms.

03

Outperforms heuristic baselines in model compression tasks.

Abstract

Pruning is an effective method for compressing Large Language Models, but finding an optimal, non-uniform layer-wise sparsity allocation remains a key challenge. While heuristic methods are fast but yield suboptimal performance, more powerful search-based approaches like Reinforcement Learning are often hindered by prohibitive computational costs on large-scale models. To overcome this efficiency barrier, we propose FastForward Pruning. Its core is a decoupled, single-step RL framework that separates policy optimization from the complex budget satisfaction problem. Such a decoupling is crucial for efficiently searching the vast policy space of LLMs. This curriculum-based strategy begins with low-cost, simple tasks and gradually increases in complexity, significantly reducing the search's computational overhead. Evaluated on the LLaMA, Mistral, and OPT model families, our framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms