From LLMs to LRMs: Rethinking Pruning for Reasoning-Centric Models

Longwei Ding; Anhao Zhao; Fanghua Ye; Ziyang Chen; Xiaoyu Shen

arXiv:2601.18091·cs.LG·January 27, 2026

From LLMs to LRMs: Rethinking Pruning for Reasoning-Centric Models

Longwei Ding, Anhao Zhao, Fanghua Ye, Ziyang Chen, Xiaoyu Shen

PDF

Open Access 4 Reviews

TL;DR

This paper investigates how different pruning strategies affect reasoning-augmented large language models, revealing that pruning methods should be tailored to the model's specific reasoning and task characteristics for optimal performance.

Contribution

It provides a controlled, comprehensive study comparing pruning strategies on reasoning-augmented models, highlighting paradigm-dependent effects and guiding better pruning practices.

Findings

01

Depth pruning outperforms width pruning on classification tasks.

02

Width pruning is more robust for generation and reasoning tasks.

03

Static pruning better preserves reasoning performance.

Abstract

Large language models (LLMs) are increasingly costly to deploy, motivating extensive research on model pruning. However, most existing studies focus on instruction-following LLMs, leaving it unclear whether established pruning strategies transfer to reasoning-augmented models that explicitly generate long intermediate reasoning traces. In this work, we conduct a controlled study of pruning for both instruction-following ( $LLM-instruct$ ) and reasoning-augmented ( $LLM-think$ ) models. To isolate the effects of pruning, we align pruning calibration and post-pruning recovery data with each model's original training distribution, which we show yields more stable and reliable pruning behavior. We evaluate static depth pruning, static width pruning, and dynamic pruning across 17 tasks spanning classification, generation, and reasoning. Our results reveal clear…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

1. The motivations are clearly presented. 2. The related works and preliminaries provide a fair and comprehensive coverage of the relevant literature. 2. The pruning strategies are well categorized into three main types—static depth pruning, static width pruning, and dynamic depth pruning—each encompassing recently published algorithms.

Weaknesses

1. Weak Contribution. This paper applies existing LLM structured pruning methods to LLM-reasoning models. For contribution, the authors should provide additional evidence as follows. - Does LLM-reasoning models exhibit different trends from LLM-instruct models when using pruning algorithms? - For static depth pruning (Shortened LLaMA or SLEB), does the cosine similarity (block importance) differ between the instruct and reasoning models? - For static width pruning (LLM-Pruner, SliceGPT),

Reviewer 02Rating 2Confidence 3

Strengths

- The paper presents a well-motivated study that clearly identifies a neglected gap in prior pruning research, the mismatch between instruction-tuned and reasoning-centric models, and frames this as a timely and practically relevant problem for the community. - The paper provides a set of elementary yet informative experiments that reveal specific failure modes when structured pruning is applied to reasoning-centric models

Weaknesses

- Although the paper positions itself as a systematic study, both the breadth and depth of the experiments appear insufficient to fully support this claim. - In terms of breadth, the study evaluates pruning behavior on only a single, relatively small-scale model, which limits the generalizability of its conclusions and weakens the argument that its findings will reliably transfer to subsequent work. - In terms of depth, the paper offers little to no mechanistic analysis of why reasoning-

Reviewer 03Rating 4Confidence 4

Strengths

- The paper first comprehensive studies comparing pruning across instruction-following models (LLM-instruct) and reasoning-augmented models (LLM-think). - The paper presents a comprehensive experimental design, covering 17 diverse datasets and three mainstream pruning strategies.

Weaknesses

- The pruning methods used in Table 1 are relatively limited, particularly for Static Depth Pruning and Static Width Pruning. The paper needs to include more pruning methods such as SLEB[1], PuDDing[2], Blockpruner[3], Olica[4], LoRAP[5] in the experiments. - The conclusion presented in this section 4.3 appears to have already been discussed in the paper[6]. - The paper does not propose a new pruning method but rather conducts a systematic comparison of existing approaches. As a result, t

Reviewer 04Rating 2Confidence 5

Strengths

The paper explored the topic of pruning in thinking LLMs, which is less explored before.

Weaknesses

- Severe Logical Flaw in Experimental Design. The paper’s core experiment misunderstands the fundamental goal of model pruning. The purpose of pruning is to obtain a smaller model that performs **better than or at least comparably to a model of equivalent size trained from scratch**. Even a small reasoning model (e.g., 1.5B parameters) [7] should retain non-zero performance instead of 0.0% across all benchmarks. If pruning leads to zero performance, **that strongly suggests implementation errors

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)