The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output Limits
Wenhua Nie, Junlin Liu, Jianan Wu, Zijie Meng, Yilong Fan, Zhang Zijian, Haoran Zheng, Jyh-Shing Roger Jang

TL;DR
This paper reveals that shared token budgets in chain-of-thought reasoning can hinder accuracy, and proposes split-budget generation as an effective mitigation, emphasizing budget allocation over trace length.
Contribution
It introduces the coupling tax concept, derives a predictive decomposition, and demonstrates split-budget generation improves reasoning accuracy across multiple tasks.
Findings
Non-thinking mode matches or outperforms thinking mode at small budgets.
Longer reasoning traces can crowd out answers due to shared token budgets.
Split-budget generation significantly improves accuracy on complex tasks.
Abstract
Chain-of-thought reasoning is often treated as a monotone way to improve language-model accuracy by letting a model think longer. We identify a countervailing effect, the coupling tax: when reasoning traces and final answers share one output-token budget, long traces can crowd out the answer they are meant to support. Across GSM8K, MATH-500, and five BIG-Bench Hard tasks with Qwen3 models at three scales, non-thinking mode matches or outperforms thinking mode on GSM8K and MATH-500 at every budget up to 2048 tokens, while harder tasks shift the crossover to larger budgets. We derive a truncation-waste decomposition, , that predicts this crossover from chain-length and accuracy statistics and explains inverse scaling within the Qwen family. A DeepSeek-R1-Distill-Llama-8B replication shows the same pattern under a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
