LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

Yang Xiao; Jiashuo Wang; Ruifeng Yuan; Chunpu Xu; Kaishuai Xu; Wenjie Li; Pengfei Liu

arXiv:2505.19187·cs.CL·October 22, 2025

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

LIMOPro introduces PIR, a framework that refines reasoning chains by pruning low-importance steps, resulting in more concise, accurate, and computationally efficient large language model inferences.

Contribution

It presents PIR, a novel importance evaluation method that selectively prunes reasoning steps to optimize test-time scaling without sacrificing core reasoning quality.

Findings

01

Models fine-tuned on PIR data show 0.9% to 6.6% accuracy improvements.

02

Reasoning chains are reduced by 3% to 41% in token usage.

03

Enhanced efficiency and accuracy across multiple reasoning benchmarks.

Abstract

Large language models (LLMs) have demonstrated remarkable reasoning capabilities through test-time scaling approaches, particularly when fine-tuned with chain-of-thought (CoT) data distilled from more powerful large reasoning models (LRMs). However, these reasoning chains often contain verbose elements that mirror human problem-solving, categorized as progressive reasoning (the essential solution development path) and functional elements (verification processes, alternative solution approaches, and error corrections). While progressive reasoning is crucial, the functional elements significantly increase computational demands during test-time inference. We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step based on its impact on answer prediction confidence. PIR systematically identifies and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gair-nlp/limopro
noneOfficial

Datasets

YangXiao-nlp/DualThinking
dataset· 68 dl
68 dl

Videos

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling· slideslive

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Model-Driven Software Engineering Techniques · Software Engineering Research