What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation

Michal Golovanevsky; William Rudman; Vedant Palit; Ritambhara Singh; and Carsten Eickhoff

arXiv:2406.16320·cs.CL·May 16, 2025·2 cites

What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation

Michal Golovanevsky, William Rudman, Vedant Palit, Ritambhara Singh, and Carsten Eickhoff

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces NOTICE, a novel interpretability pipeline for Vision-Language Models that uses noise-free corruption techniques to analyze internal decision processes and identify key attention mechanisms across tasks.

Contribution

It presents the first noise-free corruption and evaluation pipeline for mechanistic interpretability in VLMs, enabling causal analysis of multimodal integration.

Findings

01

Identification of crucial middle-layer cross-attention heads.

02

Discovery of universal cross-attention heads with distinct functions.

03

Insights into VLM decision-making across multiple datasets.

Abstract

Vision-Language Models (VLMs) have gained community-spanning prominence due to their ability to integrate visual and textual inputs to perform complex tasks. Despite their success, the internal decision-making processes of these models remain opaque, posing challenges in high-stakes applications. To address this, we introduce NOTICE, the first Noise-free Text-Image Corruption and Evaluation pipeline for mechanistic interpretability in VLMs. NOTICE incorporates a Semantic Minimal Pairs (SMP) framework for image corruption and Symmetric Token Replacement (STR) for text. This approach enables semantically meaningful causal mediation analysis for both modalities, providing a robust method for analyzing multimodal integration within models like BLIP. Our experiments on the SVO-Probes, MIT-States, and Facial Expression Recognition datasets reveal crucial insights into VLM decision-making,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wrudman/NOTICE
pytorchOfficial

Videos

What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training · BLIP: Bootstrapping Language-Image Pre-training