What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky, William Rudman, Vedant Palit, Ritambhara Singh, and Carsten Eickhoff

TL;DR
This paper introduces NOTICE, a novel interpretability pipeline for Vision-Language Models that uses noise-free corruption techniques to analyze internal decision processes and identify key attention mechanisms across tasks.
Contribution
It presents the first noise-free corruption and evaluation pipeline for mechanistic interpretability in VLMs, enabling causal analysis of multimodal integration.
Findings
Identification of crucial middle-layer cross-attention heads.
Discovery of universal cross-attention heads with distinct functions.
Insights into VLM decision-making across multiple datasets.
Abstract
Vision-Language Models (VLMs) have gained community-spanning prominence due to their ability to integrate visual and textual inputs to perform complex tasks. Despite their success, the internal decision-making processes of these models remain opaque, posing challenges in high-stakes applications. To address this, we introduce NOTICE, the first Noise-free Text-Image Corruption and Evaluation pipeline for mechanistic interpretability in VLMs. NOTICE incorporates a Semantic Minimal Pairs (SMP) framework for image corruption and Symmetric Token Replacement (STR) for text. This approach enables semantically meaningful causal mediation analysis for both modalities, providing a robust method for analyzing multimodal integration within models like BLIP. Our experiments on the SVO-Probes, MIT-States, and Facial Expression Recognition datasets reveal crucial insights into VLM decision-making,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training · BLIP: Bootstrapping Language-Image Pre-training
