Causal Attribution via Activation Patching

Amirmohammad Izadi; Mohammadali Banayeeanzade; Alireza Mirrokni; Hosein Hasani; Mobin Bagherian; Faridoun Mehri; Mahdieh Soleymani Baghshah

arXiv:2603.13652·cs.CV·May 19, 2026

Causal Attribution via Activation Patching

Amirmohammad Izadi, Mohammadali Banayeeanzade, Alireza Mirrokni, Hosein Hasani, Mobin Bagherian, Faridoun Mehri, Mahdieh Soleymani Baghshah

PDF

TL;DR

This paper introduces CAAP, a new method for explaining Vision Transformers by directly intervening on internal activations to produce more faithful and localized attributions of image regions.

Contribution

CAAP is a novel causal attribution method that intervenes on internal activations, improving the faithfulness and localization of attribution maps for ViTs.

Findings

01

CAAP outperforms existing attribution methods across multiple ViT models.

02

CAAP produces more faithful and well-localized attribution maps.

03

CAAP effectively captures the causal contribution of image patches to predictions.

Abstract

Attribution methods for Vision Transformers (ViTs) aim to identify image regions that influence model predictions, but producing faithful and well-localized attributions remains challenging. Existing attribution methods face several limitations, with gradient-based, relevance-propagation, and attention-based methods relying on local approximations, while perturbation or optimization-based methods intervene on inputs, tokens, or surrogates rather than internal patch representations. The key challenge is that class-relevant evidence is formed through interactions between patch tokens across layers; methods that operate only on input changes, attention weights, or backward relevance signals may therefore provide indirect proxies for patch importance rather than directly testing the predictive effect of contextualized patch representations. We propose Causal Attribution via Activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face Recognition and Perception · Explainable Artificial Intelligence (XAI)