TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

Weizhe Lin; Xing Li; Zhiyuan Yang; Xiaojin Fu; Hui-Ling Zhen; Yaoyuan Wang; Xianzhi Yu; Wulong Liu; Xiaosong Li; Mingxuan Yuan

arXiv:2505.17155·cs.LG·June 3, 2025

TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

Weizhe Lin, Xing Li, Zhiyuan Yang, Xiaojin Fu, Hui-Ling Zhen, Yaoyuan Wang, Xianzhi Yu, Wulong Liu, Xiaosong Li, Mingxuan Yuan

PDF

TL;DR

TrimR is a verifier-based, training-free framework that compresses reasoning chains in large reasoning models to significantly improve inference efficiency during test-time scaling, with minimal accuracy loss.

Contribution

It introduces a novel verifier-based, training-free method for dynamic reasoning chain compression tailored for production deployment.

Findings

01

Achieves up to 70% reduction in reasoning runtime.

02

Maintains negligible accuracy loss across multiple benchmarks.

03

Demonstrates efficiency gains on large-batch industrial workloads.

Abstract

Large Reasoning Models (LRMs) demonstrate exceptional capability in tackling complex mathematical, logical, and coding tasks by leveraging extended Chain-of-Thought (CoT) reasoning. Test-time scaling methods, such as prolonging CoT with explicit token-level exploration, can push LRMs' accuracy boundaries, but they incur significant decoding overhead. A key inefficiency source is LRMs often generate redundant thinking CoTs, which demonstrate clear structured overthinking and underthinking patterns. Inspired by human cognitive reasoning processes and numerical optimization theories, we propose TrimR, a verifier-based, training-free, efficient framework for dynamic CoT compression to trim reasoning and enhance test-time scaling, explicitly tailored for production-level deployment. Our method employs a lightweight, pretrained, instruction-tuned verifier to detect and truncate redundant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMixture of Experts