Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning
Meng Luo, Bobo Li, Shanqing Xu, Shize Zhang, Qiuchan Chen, Menglu Han, Wenhao Chen, Yanxiang Huang, Hao Fei, Mong-Li Lee, Wynne Hsu

TL;DR
This paper introduces HitEmotion, a hierarchical benchmark and a ToM-guided reasoning method to evaluate and improve multimodal large language models' ability for deep emotional understanding through explicit Theory of Mind modeling.
Contribution
It presents a new benchmark for diagnosing emotional reasoning and a ToM-guided reinforcement learning approach to enhance emotional understanding in MLLMs.
Findings
HitEmotion reveals deep emotional reasoning deficits in current models.
ToM-guided reasoning improves task accuracy and rationale coherence.
The approach offers a practical toolkit for advancing emotional cognition in MLLMs.
Abstract
Despite rapid progress in multimodal large language models (MLLMs), their capability for deep emotional understanding remains limited. We argue that genuine affective intelligence requires explicit modeling of Theory of Mind (ToM), the cognitive substrate from which emotions arise. To this end, we introduce HitEmotion, a ToM-grounded hierarchical benchmark that diagnoses capability breakpoints across increasing levels of cognitive depth. Second, we propose a ToM-guided reasoning chain that tracks mental states and calibrates cross-modal evidence to achieve faithful emotional reasoning. We further introduce TMPO, a reinforcement learning method that uses intermediate mental states as process-level supervision to guide and strengthen model reasoning. Extensive experiments show that HitEmotion exposes deep emotional reasoning deficits in state-of-the-art models, especially on cognitively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Action Observation and Synchronization · Topic Modeling
