From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models
Xinyang Li, Siqi Liu, Bochao Zou, Jiansheng Chen, Huimin Ma

TL;DR
This paper introduces a new interpretability-driven method to evaluate and improve Theory of Mind in multimodal large language models, using a novel dataset and attention head analysis.
Contribution
It develops a multimodal ToM test dataset and demonstrates that attention heads can reveal ToM capabilities, also proposing a training-free enhancement technique.
Findings
Attention heads distinguish cognitive information across perspectives.
Attention mechanisms can be used to assess ToM in multimodal models.
A lightweight method improves the models' ToM abilities without additional training.
Abstract
As large language models evolve, there is growing anticipation that they will emulate human-like Theory of Mind (ToM) to assist with routine tasks. However, existing methods for evaluating machine ToM focus primarily on unimodal models and largely treat these models as black boxes, lacking an interpretative exploration of their internal mechanisms. In response, this study adopts an approach based on internal mechanisms to provide an interpretability-driven assessment of ToM in multimodal large language models (MLLMs). Specifically, we first construct a multimodal ToM test dataset, GridToM, which incorporates diverse belief testing tasks and perceptual information from multiple perspectives. Next, our analysis shows that attention heads in multimodal large models can distinguish cognitive information across perspectives, providing evidence of ToM capabilities. Furthermore, we present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsFocus
