Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking
Jingcheng Yang, Tianhu Xiong, Shengyi Qian, Klara Nahrstedt, Mingyuan Wu

TL;DR
This paper introduces a novel framework for transparent circuit tracing in vision-language models, revealing how they hierarchically integrate visual and semantic information, and demonstrating the causal and controllable nature of specific circuits.
Contribution
It presents the first systematic approach for analyzing internal circuits in VLMs, enabling understanding and control of multimodal reasoning processes.
Findings
Visual feature circuits handle mathematical reasoning
Circuits support cross-modal associations
Circuits are shown to be causal and controllable
Abstract
Vision-language models (VLMs) are powerful but remain opaque black boxes. We introduce the first framework for transparent circuit tracing in VLMs to systematically analyze multimodal reasoning. By utilizing transcoders, attribution graphs, and attention-based methods, we uncover how VLMs hierarchically integrate visual and semantic concepts. We reveal that distinct visual feature circuits can handle mathematical reasoning and support cross-modal associations. Validated through feature steering and circuit patching, our framework proves these circuits are causal and controllable, laying the groundwork for more explainable and reliable VLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Explainable Artificial Intelligence (XAI)
