Causal Probing for Internal Visual Representations in Multimodal Large Language Models
Zehao Deng, Tianjie Ju, Zheng Wu, Liangbo He, Jun Lan, Huijia Zhu, Weiqiang Wang, Zhuosheng Zhang

TL;DR
This paper introduces a causal probing framework to analyze how multimodal large language models encode visual concepts, revealing different encoding strategies for entities and abstract concepts and their scaling behaviors.
Contribution
It presents a novel activation steering method to systematically investigate internal visual representations and their relation to model scaling and reasoning capabilities.
Findings
Entities are encoded with localized memorization.
Abstract concepts are globally distributed and require deeper models.
Recognition of geometric relations does not equate to procedural reasoning.
Abstract
Despite the remarkable success of Multimodal Large Language Models (MLLMs) across diverse tasks, the internal mechanisms governing how they encode and ground distinct visual concepts remain poorly understood. To bridge this gap, we propose a causal framework based on activation steering to actively probe and manipulate internal visual representations. Through systematic intervention across four visual concept categories, our results reveal a divergence in concept encoding: entities exhibit distinct localized memorization, whereas abstract concepts are globally distributed across the network. Critically, this divergence uncovers a mechanistic driver of scaling laws: increasing model depth is indispensable for encoding distributed and complex abstract concepts, whereas entity localization remains remarkably invariant to scale. Furthermore, reverse steering uncovers that blocking explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
