SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation

Gengyang Li; Wang Cai; Yifeng Gao; Yunfang Wu

arXiv:2601.03649·cs.CL·January 8, 2026

SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation

Gengyang Li, Wang Cai, Yifeng Gao, Yunfang Wu

PDF

Open Access

TL;DR

SyncThink is a training-free decoding method that efficiently terminates reasoning in chain-of-thought prompting by monitoring a special token, reducing inference cost and improving accuracy on reasoning tasks.

Contribution

It introduces a novel, training-free approach to align inference termination with reasoning saturation using a model's own signals, without modifying model weights.

Findings

01

Achieves comparable accuracy with fewer tokens and lower latency.

02

Reduces inference time from 92.01 s to 28.68 s on average.

03

Improves accuracy by up to +8.1 on long-horizon tasks.

Abstract

Chain-of-Thought (CoT) prompting improves reasoning but often produces long and redundant traces that substantially increase inference cost. We present SyncThink, a training-free and plug-and-play decoding method that reduces CoT overhead without modifying model weights. We find that answer tokens attend weakly to early reasoning and instead focus on the special token "/think", indicating an information bottleneck. Building on this observation, SyncThink monitors the model's own reasoning-transition signal and terminates reasoning. Experiments on GSM8K, MMLU, GPQA, and BBH across three DeepSeek-R1 distilled models show that SyncThink achieves 62.00 percent average Top-1 accuracy using 656 generated tokens and 28.68 s latency, compared to 61.22 percent, 2141 tokens, and 92.01 s for full CoT decoding. On long-horizon tasks such as GPQA, SyncThink can further yield up to +8.1 absolute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare