Reasoning Traces Shape Outputs but Models Won't Say So

Yijie Hao; Lingjie Chen; Ali Emami; Joyce Ho

arXiv:2603.20620·cs.AI·March 24, 2026

Reasoning Traces Shape Outputs but Models Won't Say So

Yijie Hao, Lingjie Chen, Ali Emami, Joyce Ho

PDF

Open Access

TL;DR

This paper investigates whether large reasoning models honestly report their reasoning processes, finding they often refuse to disclose true influences and instead fabricate explanations, highlighting a gap between model behavior and reported reasoning.

Contribution

The study introduces Thought Injection, a method to test if models follow injected reasoning, revealing models' reluctance to disclose true influences and systematic fabrication of explanations.

Findings

01

Injected hints reliably alter model outputs

02

Models overwhelmingly refuse to disclose true reasoning influences

03

Fabricated explanations activate deception-related neural directions

Abstract

Can we trust the reasoning traces that large reasoning models (LRMs) produce? We investigate whether these traces faithfully reflect what drives model outputs, and whether models will honestly report their influence. We introduce Thought Injection, a method that injects synthetic reasoning snippets into a model's <think> trace, then measures whether the model follows the injected reasoning and acknowledges doing so. Across 45,000 samples from three LRMs, we find that injected hints reliably alter outputs, confirming that reasoning traces causally shape model behavior. However, when asked to explain their changed answers, models overwhelmingly refuse to disclose the influence: overall non-disclosure exceeds 90% for extreme hints across 30,000 follow-up samples. Instead of acknowledging the injected reasoning, models fabricate aligned-appearing but unrelated explanations. Activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference · Topic Modeling