Therefore I am. I Think
Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani

TL;DR
This paper provides evidence that large language reasoning models encode decision points early in their activations, influencing their reasoning process and output before generating reasoning tokens.
Contribution
The study demonstrates that early decision encoding in reasoning models can be decoded and causally manipulated, revealing insights into their internal decision-making process.
Findings
Decodable tool-calling decisions from pre-generation activations with high confidence.
Perturbing decision directions causes increased deliberation and behavior flips.
Models often rationalize decision flips rather than resist them.
Abstract
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
