CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?

Ruirui Chen; Weifeng Jiang; Chengwei Qin; Cheston Tan

arXiv:2603.11915·cs.CL·March 13, 2026

CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?

Ruirui Chen, Weifeng Jiang, Chengwei Qin, Cheston Tan

PDF

Open Access

TL;DR

This paper introduces CoMMET, a novel multimodal benchmark dataset designed to evaluate Large Language Models' Theory of Mind abilities across diverse mental states and multi-turn conversations, revealing current strengths and limitations.

Contribution

The paper presents CoMMET, the first multimodal, multi-turn ToM benchmark for LLMs, expanding evaluation scope beyond belief tasks and providing insights into models' social reasoning capabilities.

Findings

01

LLMs show varied performance across mental states.

02

Multi-turn evaluation reveals limitations in current models.

03

CoMMET enables comprehensive assessment of social cognition in LLMs.

Abstract

Theory of Mind (ToM)-the ability to reason about the mental states of oneself and others-is a cornerstone of human social intelligence. As Large Language Models (LLMs) become ubiquitous in real-world applications, validating their capacity for this level of social reasoning is essential for effective and natural interactions. However, existing benchmarks for assessing ToM in LLMs are limited; most rely solely on text inputs and focus narrowly on belief-related tasks. In this paper, we propose a new multimodal benchmark dataset, CoMMET, a Comprehensive Mental states and Moral Evaluation Task inspired by the Theory of Mind Booklet Task. CoMMET expands the scope of evaluation by covering a broader range of mental states and introducing multi-turn testing. To the best of our knowledge, this is the first multimodal dataset to evaluate ToM in a multi-turn conversational setting. Through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining