Thought Branches: Interpreting LLM Reasoning Requires Resampling

Uzay Macar; Paul C. Bogdan; Senthooran Rajamanoharan; Neel Nanda

arXiv:2510.27484·cs.LG·April 14, 2026

Thought Branches: Interpreting LLM Reasoning Requires Resampling

Uzay Macar, Paul C. Bogdan, Senthooran Rajamanoharan, Neel Nanda

PDF

2 Models 1 Datasets 1 Video

TL;DR

This paper advocates for resampling-based methods to analyze the reasoning processes of large language models, enabling more reliable causal insights and interventions beyond single chain-of-thought samples.

Contribution

It introduces resampling techniques for interpreting LLM reasoning, providing new tools for causal analysis, intervention assessment, and understanding unfaithful reasoning.

Findings

01

Resampling reveals that self-preservation reasons have minimal causal impact on blackmail decisions.

02

Off-policy interventions have limited and unstable effects compared to resampling.

03

Resilience metrics show critical reasoning steps resist removal but significantly influence outcomes when eliminated.

Abstract

Most work interpreting reasoning models studies only a single chain-of-thought (CoT), yet these models define distributions over many possible CoTs. We argue that studying a single sample is inadequate for understanding causal influence and the underlying computation. Though fully specifying this distribution is intractable, we can measure a partial CoT's impact by resampling only the subsequent text. We present case studies using resampling to investigate model decisions. First, when a model states a reason for its action, does that reason actually cause the action? In "agentic misalignment" scenarios, we find that self-preservation sentences have small causal impact, suggesting they do not meaningfully drive blackmail. Second, are artificial edits to CoT sufficient for steering reasoning? Resampling and selecting a completion with the desired property is a principled on-policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

japhba/cot-oracle-eval-thought-branches
dataset· 39 dl
39 dl

Videos

Thought Branches: Interpreting LLM Reasoning Requires Resampling· slideslive