Step-resolved data attribution for looped transformers
Georgios Kaissis, David Mildenberger, Juan Felipe Gomez, Martin J. Menten, Eleni Triantafillou

TL;DR
This paper introduces Step-Decomposed Influence (SDI), a method to analyze how individual training examples influence specific iterations in looped transformers, providing detailed interpretability of latent reasoning processes.
Contribution
The paper proposes SDI, a scalable influence estimation method that decomposes influence over recurrent steps in looped transformers, enabling detailed attribution at each iteration.
Findings
SDI scales well to large models and tasks.
SDI matches full-gradient influence estimations with low error.
SDI provides per-step insights into the model's latent reasoning.
Abstract
We study how individual training examples shape the internal computation of looped transformers, where a shared block is applied for recurrent iterations to enable latent reasoning. Existing training-data influence estimators such as TracIn yield a single scalar score that aggregates over all loop iterations, obscuring when during the recurrent computation a training example matters. We introduce \textit{Step-Decomposed Influence (SDI)}, which decomposes TracIn into a length- influence trajectory by unrolling the recurrent computation graph and attributing influence to specific loop iterations. To make SDI practical at transformer scale, we propose a TensorSketch implementation that never materialises per-example gradients. Experiments on looped GPT-style models and algorithmic reasoning tasks show that SDI scales excellently, matches full-gradient baselines with low error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Explainable Artificial Intelligence (XAI)
