Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT
Sai V R Chereddy

TL;DR
This paper uncovers a complex internal circuit in a video vision transformer where attention heads gather evidence and MLPs compose concepts to represent action outcomes, revealing hidden semantic information crucial for trustworthy AI.
Contribution
It provides a mechanistic interpretability analysis of a video model, identifying how attention and MLP components collaborate to encode outcome signals.
Findings
Attention heads gather evidence for outcome signals.
MLPs act as concept composers for success signals.
The circuit is resilient to ablations, indicating distributed processing.
Abstract
The paper explores how video models trained for classification tasks represent nuanced, hidden semantic information that may not affect the final outcome, a key challenge for Trustworthy AI models. Through Explainable and Interpretable AI methods, specifically mechanistic interpretability techniques, the internal circuit responsible for representing the action's outcome is reverse-engineered in a pre-trained video vision transformer, revealing that the "Success vs Failure" signal is computed through a distinct amplification cascade. While there are low-level differences observed from layer 0, the abstract and semantic representation of the outcome is progressively amplified from layers 5 through 11. Causal analysis, primarily using activation patching supported by ablation results, reveals a clear division of labor: Attention Heads act as "evidence gatherers", providing necessary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
