Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Yaniv Nikankin; Dana Arad; Yossi Gandelsman; Yonatan Belinkov

arXiv:2506.09047·cs.CL·October 6, 2025

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Yaniv Nikankin, Dana Arad, Yossi Gandelsman, Yonatan Belinkov

PDF

Open Access 1 Repo

TL;DR

This paper investigates the modality-specific circuits in vision-language models, revealing their differences and similarities, and proposes a simple intervention to reduce the performance gap between visual and textual modalities.

Contribution

It identifies modality-specific circuits in VLMs, analyzes their functionalities, and introduces a training-free method to improve visual data representations, closing part of the modality gap.

Findings

01

Circuits are largely disjoint between modalities but perform similar functions.

02

Visual representations align with textual ones only in later layers.

03

Patching visual tokens from later to earlier layers reduces the modality gap by a third.

Abstract

Vision-Language models (VLMs) show impressive abilities to answer questions on visual inputs (e.g., counting objects in an image), yet demonstrate higher accuracies when performing an analogous task on text (e.g., counting words in a text). We investigate this accuracy gap by identifying and comparing the \textit{circuits} - the task-specific computational sub-graphs - in different modalities. We show that while circuits are largely disjoint between modalities, they implement relatively similar functionalities: the differences lie primarily in processing modality-specific data positions (an image or a text sequence). Zooming in on the image data representations, we observe they become aligned with the higher-performing analogous textual representations only towards later layers, too late in processing to effectively influence subsequent positions. To overcome this, we patch the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

technion-cs-nlp/vlm-circuits-analysis
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling