Metonymy in vision models undermines attention-based interpretability
Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Massimiliano Mancini, Diego Marcos

TL;DR
This paper reveals that modern vision transformers violate the locality assumption, causing intra-object leakage that undermines the interpretability of part-based reasoning methods.
Contribution
It demonstrates the presence of intra-object leakage in vision models, shows its impact on interpretability, and proposes a two-stage approach to mitigate this issue.
Findings
Modern pretrained vision transformers exhibit strong intra-object leakage.
Intra-object leakage compromises the faithfulness of attention-based interpretability methods.
A two-stage approach can prevent leakage and improve attribute-driven part discovery.
Abstract
Part-based reasoning is a classical strategy to make a computer vision model directly focus on the object parts that are relevant to the downstream task. In the context of deep learning, this also serves to improve by-design interpretability, often by using part-centric attention mechanisms on top of a latent image representation provided by a standard, black-box model. This approach is based on a locality assumption: that the latent representation of an object part encodes primarily information about the corresponding image region. In this work, we test this basic assumption, measuring intra-object leakage in vision models using part-based attribute annotations. Through a comprehensive experimental evaluation, we show that modern pretrained vision transformers violate the locality assumption and exhibit a strong intra-object leakage, in which each part encodes information from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
