Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

Jingcheng Yang; Tianhu Xiong; Shengyi Qian; Klara Nahrstedt; Mingyuan Wu

arXiv:2602.20330·cs.CV·February 25, 2026

Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

Jingcheng Yang, Tianhu Xiong, Shengyi Qian, Klara Nahrstedt, Mingyuan Wu

PDF

Open Access

TL;DR

This paper introduces a novel framework for transparent circuit tracing in vision-language models, revealing how they hierarchically integrate visual and semantic information, and demonstrating the causal and controllable nature of specific circuits.

Contribution

It presents the first systematic approach for analyzing internal circuits in VLMs, enabling understanding and control of multimodal reasoning processes.

Findings

01

Visual feature circuits handle mathematical reasoning

02

Circuits support cross-modal associations

03

Circuits are shown to be causal and controllable

Abstract

Vision-language models (VLMs) are powerful but remain opaque black boxes. We introduce the first framework for transparent circuit tracing in VLMs to systematically analyze multimodal reasoning. By utilizing transcoders, attribution graphs, and attention-based methods, we uncover how VLMs hierarchically integrate visual and semantic concepts. We reveal that distinct visual feature circuits can handle mathematical reasoning and support cross-modal associations. Validated through feature steering and circuit patching, our framework proves these circuits are causal and controllable, laying the groundwork for more explainable and reliable VLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Explainable Artificial Intelligence (XAI)