Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens

Karthik Valmeekam; Kaya Stechly; Vardhan Palod; Atharva Gundawar; Subbarao Kambhampati

arXiv:2505.13775·cs.LG·November 25, 2025

Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens

Karthik Valmeekam, Kaya Stechly, Vardhan Palod, Atharva Gundawar, Subbarao Kambhampati

PDF

Open Access 1 Repo

TL;DR

This study investigates the role of reasoning traces in large models, revealing that intermediate tokens may not reflect actual reasoning and that models perform similarly even with corrupted traces, challenging assumptions about their interpretability.

Contribution

The paper provides a systematic analysis showing that reasoning traces are not essential for correct solutions and that their semantics do not reliably indicate the model's reasoning process.

Findings

01

Models trained on correct traces can produce invalid reasoning traces.

02

Corrupted traces lead to similar or better performance and generalization.

03

Trace length does not correlate with problem complexity.

Abstract

Recent impressive results from large reasoning models have been interpreted as a triumph of Chain of Thought (CoT), especially of training on CoTs sampled from base LLMs to help find new reasoning patterns. While these traces certainly seem to help model performance, it is not clear how they actually influence it, with some works ascribing semantics to the traces and others cautioning against relying on them as transparent and faithful proxies of the model's internal computational process. To systematically investigate the role of end-user semantics of derivational traces, we set up a controlled study where we train transformer models from scratch on formally verifiable reasoning traces and the solutions they lead to. We notice that, despite significant gains over the solution-only baseline, models trained on entirely correct traces can still produce invalid reasoning traces even when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaoolee/garss
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsBalanced Selection · ALIGN