Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models
Ziyan Wang, Enmao Diao, Qi Le, Pu Wang, Guanchu Wang, Minwoo Lee, Shu-ping Yeh, Li Yang

TL;DR
This paper introduces RESP, a self-reflective structured pruning method that effectively reduces reasoning LLMs' size while maintaining high accuracy, by aligning pruning with the model's reasoning process using self-generated calibration.
Contribution
The paper proposes a novel self-reflective pruning framework that leverages the model's own reasoning traces for calibration, significantly improving pruning robustness for reasoning LLMs.
Findings
RESP preserves near-dense accuracy at 20-30% sparsity.
RESP outperforms existing methods at higher sparsity levels.
At 40% sparsity, RESP achieves 81.3% accuracy on GSM8K.
Abstract
Reasoning LLMs (RLMs) such as OpenAI o1, DeepSeek-R1, and Qwen3 deliver strong multi-step reasoning through chain-of-thought generation, but their large model sizes and lengthy decode-time outputs make them costly to deploy and unsuitable for resource-constrained settings. To reduce computing and memory cost, pruning offers a promising solution by removing unimportant parameters. However, despite their success on standard LLMs, existing pruning methods severely damage RLMs, as even moderate sparsity (e.g., 20%) can collapse accuracy and completely disrupt the model's reasoning coherence. We begin by analyzing why existing pruning pipelines fail on reasoning LLMs and find that their brittleness largely stems from a mismatch between the calibration data, the pruning objective, and the model's decode-time reasoning behavior. Our study further shows that the most reliable calibration signal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
