The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output Limits

Wenhua Nie; Junlin Liu; Jianan Wu; Zijie Meng; Yilong Fan; Zhang Zijian; Haoran Zheng; Jyh-Shing Roger Jang

arXiv:2605.07686·cs.LG·May 11, 2026

The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output Limits

Wenhua Nie, Junlin Liu, Jianan Wu, Zijie Meng, Yilong Fan, Zhang Zijian, Haoran Zheng, Jyh-Shing Roger Jang

PDF

TL;DR

This paper reveals that shared token budgets in chain-of-thought reasoning can hinder accuracy, and proposes split-budget generation as an effective mitigation, emphasizing budget allocation over trace length.

Contribution

It introduces the coupling tax concept, derives a predictive decomposition, and demonstrates split-budget generation improves reasoning accuracy across multiple tasks.

Findings

01

Non-thinking mode matches or outperforms thinking mode at small budgets.

02

Longer reasoning traces can crowd out answers due to shared token budgets.

03

Split-budget generation significantly improves accuracy on complex tasks.

Abstract

Chain-of-thought reasoning is often treated as a monotone way to improve language-model accuracy by letting a model think longer. We identify a countervailing effect, the coupling tax: when reasoning traces and final answers share one output-token budget, long traces can crowd out the answer they are meant to support. Across GSM8K, MATH-500, and five BIG-Bench Hard tasks with Qwen3 models at three scales, non-thinking mode matches or outperforms thinking mode on GSM8K and MATH-500 at every budget up to 2048 tokens, while harder tasks shift the crossover to larger budgets. We derive a truncation-waste decomposition, $Acc_{think} (b) = α_{c} F_{L} (b) + α_{t} (1 - F_{L} (b))$ , that predicts this crossover from chain-length and accuracy statistics and explains inverse scaling within the Qwen family. A DeepSeek-R1-Distill-Llama-8B replication shows the same pattern under a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.