Directional Reasoning Trajectory Change (DRTC): Identifying Critical Trace Segments in Reasoning Models
Waldemar Chang

TL;DR
DRTC is a novel causal method that identifies critical decision points in language model reasoning by analyzing how interventions on context chunks influence reasoning trajectories, providing insights into model interpretability.
Contribution
Introduces DRTC, a process-causal approach that detects pivot points in reasoning models and measures their influence through trajectory redirection and geometric diagnostics.
Findings
Influence is concentrated in few context segments.
Learned pivots have stronger effects than random spans.
DRTC outperforms gradient- and perturbation-based attribution methods.
Abstract
Understanding how language models carry out long-horizon reasoning remains an open challenge. Existing interpretability methods often highlight tokens correlated with an answer, but rarely reveal where consequential reasoning turns occur, which earlier context triggers them under causal intervention, or whether highlighted text actually steers the rollout. We introduce Directional Reasoning Trajectory Change (DRTC), a process-causal method that (i) detects pivot decision points via uncertainty and distribution-shift signals and (ii) applies receiver-side interventions that preserve the realized continuation without resampling while blocking information flow from selected earlier chunks only at a pivot. DRTC measures how each intervention redirects the log-probability trajectory relative to the realized rollout direction, yielding signed per-chunk attributions; we also compute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling
