Thought Anchors: Which LLM Reasoning Steps Matter?

Paul C. Bogdan; Uzay Macar; Neel Nanda; Arthur Conmy

arXiv:2506.19143·cs.LG·October 28, 2025

Thought Anchors: Which LLM Reasoning Steps Matter?

Paul C. Bogdan, Uzay Macar, Neel Nanda, Arthur Conmy

PDF

1 Repo 3 Models 4 Datasets 3 Reviews

TL;DR

This paper introduces a sentence-level interpretability method for large language models that identifies key reasoning steps, called thought anchors, which significantly influence the final answer and can improve understanding of model reasoning processes.

Contribution

The paper presents a novel black-box approach to identify thought anchors at the sentence level, revealing their impact on reasoning and providing tools for analyzing model behavior.

Findings

01

Certain sentences have outsized influence on reasoning trajectories.

02

Thought anchors are often planning or uncertainty sentences.

03

Attention heads focus on thought anchors during reasoning.

Abstract

Current frontier large-language models rely on reasoning to achieve state-of-the-art performance. Many existing interpretability are limited in this area, as standard methods have been designed to study single forward passes of a model rather than the multi-token computational steps that unfold during reasoning. We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We introduce a black-box method that measures each sentence's counterfactual importance by repeatedly sampling replacement sentences from the model, filtering for semantically different ones, and continuing the chain of thought from that point onwards to quantify the sentence's impact on the distribution of final answers. We discover that certain sentences can have an outsized impact on the trajectory of the reasoning trace and final answer. We term these…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper proposes a sentence-level toolkit for analyzing CoT traces using both black-box and white-box methods. 2. Step-by-step case studies accompany each stage of the pipeline, improving clarity and reproducibility. 3. It introduces thought anchors—sentences identified by estimating per-sentence importance via resampling scoring—and further examines relationships among anchors through attention analysis.

Weaknesses

1. Motivation clarity: The paper argues for interpreting CoT, but does not clearly explain why CoT text alone is insufficient, nor how the method advances beyond prior interpretability work. 2. Computational cost: Analyzing one CoT trace is expensive (e.g., ~100 resamples per sentence plus an auxiliary labeling model). Practicality at scale is unclear. 3. Ablations are limited: No systematic study of design choices (e.g., sentence attention: mean/last-token/concat; Counterfactual importance: s

Reviewer 02Rating 6Confidence 2

Strengths

1. The paper offers a fine-grained interpretability analysis by examining reasoning at the sentence level rather than the token level, allowing a clearer view of how individual reasoning steps contribute to the overall thought process. 2. It introduces a sentence-masking method to study causal dependencies between reasoning steps, providing a thorough and systematic analysis of how earlier sentences influence later ones in the reasoning process. 3. This work releases an open-source interacti

Weaknesses

1. The analysis assumes clean sentence segmentation and treats each sentence as an independent reasoning unit, which may not hold in more complicated reasoning contexts where boundaries are fuzzy. 2. The "thought anchor" concept is derived from a very limited dataset: just 20 reasoning traces from 10 math problems. Furthermore, the study only selected problems the model can solve 25-75% of the time, so the findings may not generalize to problems the model consistently fails or solves easily.

Reviewer 03Rating 6Confidence 4

Strengths

The paper conducted a systematic analysis of the reasoning traces of LLMs by resampling and masking attention weights at a sentence-sentence level. Several interesting observations are made in the paper, e.g., correlation between problem difficulty and range of sentence-level links in the reasoning traces.

Weaknesses

See questions.

Code & Models

Repositories

codelion/pts
pytorch

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.