TL;DR
This paper introduces Conditional Token Selection (CTS), a method to compress reasoning chains in models by selecting only essential tokens, improving efficiency without sacrificing accuracy.
Contribution
The paper proposes CTS, a novel token-level compression framework that identifies and retains only crucial tokens in reasoning chains, reducing redundancy and computational costs.
Findings
CTS reduces reasoning tokens by up to 75.8%.
Models trained with CTS maintain high accuracy despite significant token reduction.
CTS improves inference efficiency while preserving reasoning performance.
Abstract
Modern reasoning models, such as OpenAI's o1 and DeepSeek-R1, exhibit impressive problem-solving capabilities but suffer from critical inefficiencies: high inference latency, excessive computational resource consumption, and a tendency toward overthinking -- generating verbose chains of thought (CoT) laden with redundant tokens that contribute minimally to the final answer. To address these issues, we propose Conditional Token Selection (CTS), a token-level compression framework with a flexible and variable compression ratio that identifies and preserves only the most essential tokens in CoT. CTS evaluates each token's contribution to deriving correct answers using conditional importance scoring, then trains models on compressed CoT. Extensive experiments demonstrate that CTS effectively compresses long CoT while maintaining strong reasoning performance. Notably, on the GPQA benchmark,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
