Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models

Ziyan Wang; Enmao Diao; Qi Le; Pu Wang; Guanchu Wang; Minwoo Lee; Shu-ping Yeh; Li Yang

arXiv:2512.02185·cs.CL·December 3, 2025

Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models

Ziyan Wang, Enmao Diao, Qi Le, Pu Wang, Guanchu Wang, Minwoo Lee, Shu-ping Yeh, Li Yang

PDF

Open Access

TL;DR

This paper introduces RESP, a self-reflective structured pruning method that effectively reduces reasoning LLMs' size while maintaining high accuracy, by aligning pruning with the model's reasoning process using self-generated calibration.

Contribution

The paper proposes a novel self-reflective pruning framework that leverages the model's own reasoning traces for calibration, significantly improving pruning robustness for reasoning LLMs.

Findings

01

RESP preserves near-dense accuracy at 20-30% sparsity.

02

RESP outperforms existing methods at higher sparsity levels.

03

At 40% sparsity, RESP achieves 81.3% accuracy on GSM8K.

Abstract

Reasoning LLMs (RLMs) such as OpenAI o1, DeepSeek-R1, and Qwen3 deliver strong multi-step reasoning through chain-of-thought generation, but their large model sizes and lengthy decode-time outputs make them costly to deploy and unsuitable for resource-constrained settings. To reduce computing and memory cost, pruning offers a promising solution by removing unimportant parameters. However, despite their success on standard LLMs, existing pruning methods severely damage RLMs, as even moderate sparsity (e.g., 20%) can collapse accuracy and completely disrupt the model's reasoning coherence. We begin by analyzing why existing pruning pipelines fail on reasoning LLMs and find that their brittleness largely stems from a mismatch between the calibration data, the pruning objective, and the model's decode-time reasoning behavior. Our study further shows that the most reliable calibration signal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks