LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu

TL;DR
LIMOPro introduces PIR, a framework that refines reasoning chains by pruning low-importance steps, resulting in more concise, accurate, and computationally efficient large language model inferences.
Contribution
It presents PIR, a novel importance evaluation method that selectively prunes reasoning steps to optimize test-time scaling without sacrificing core reasoning quality.
Findings
Models fine-tuned on PIR data show 0.9% to 6.6% accuracy improvements.
Reasoning chains are reduced by 3% to 41% in token usage.
Enhanced efficiency and accuracy across multiple reasoning benchmarks.
Abstract
Large language models (LLMs) have demonstrated remarkable reasoning capabilities through test-time scaling approaches, particularly when fine-tuned with chain-of-thought (CoT) data distilled from more powerful large reasoning models (LRMs). However, these reasoning chains often contain verbose elements that mirror human problem-solving, categorized as progressive reasoning (the essential solution development path) and functional elements (verification processes, alternative solution approaches, and error corrections). While progressive reasoning is crucial, the functional elements significantly increase computational demands during test-time inference. We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step based on its impact on answer prediction confidence. PIR systematically identifies and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Model-Driven Software Engineering Techniques · Software Engineering Research
