Broken Chains: The Cost of Incomplete Reasoning in LLMs

Ian Su; Gaurav Purushothaman; Jey Narayan; Ruhika Goel; Kevin Zhu; Sunishchal Dev; Yash More; Maheep Chaudhary

arXiv:2602.14444·cs.LG·February 17, 2026

Broken Chains: The Cost of Incomplete Reasoning in LLMs

Ian Su, Gaurav Purushothaman, Jey Narayan, Ruhika Goel, Kevin Zhu, Sunishchal Dev, Yash More, Maheep Chaudhary

PDF

Open Access 1 Video

TL;DR

This paper investigates how different reasoning modalities and token budget constraints affect the performance of large language models on mathematical benchmarks, revealing that incomplete reasoning can actively mislead models and that robustness varies across models.

Contribution

The study introduces a framework to systematically analyze the impact of reasoning modality and token constraints on model performance, highlighting the importance of complete reasoning chains.

Findings

01

Truncated reasoning can significantly reduce accuracy.

02

Code reasoning degrades gracefully under token constraints.

03

Hybrid reasoning modalities underperform compared to single modalities.

Abstract

Reasoning-specialized models like OpenAI's 5.1 and DeepSeek-V3.2 allocate substantial inference compute to extended chain-of-thought (CoT) traces, yet reasoning tokens incur significant costs. How do different reasoning modalities of code, natural language, hybrid, or none do perform under token constraints? We introduce a framework that constrains models to reason exclusively through code, comments, both, or neither, then systematically ablates token budgets to 10\%, 30\%, 50\%, and 70\% of optimal. We evaluate four frontier models (GPT-5.1, Gemini 3 Flash, DeepSeek-V3.2, Grok 4.1) across mathematical benchmarks (AIME, GSM8K, HMMT). Our findings reveal: (1) \textbf{truncated reasoning can hurt} as DeepSeek-V3.2 achieves 53\% with no reasoning but only 17\% with truncated CoT at 50\% budget; (2) \textbf{code degrades gracefully} as Gemini's comments collapse to 0\% while code maintains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Broken Chains: The Cost of Incomplete Reasoning in LLMs· underline

Taxonomy

TopicsScientific Computing and Data Management · Software Engineering Research · Topic Modeling