Re-evaluating Theory of Mind evaluation in large language models

Jennifer Hu; Felix Sosa; Tomer Ullman

arXiv:2502.21098·cs.AI·March 3, 2025

Re-evaluating Theory of Mind evaluation in large language models

Jennifer Hu, Felix Sosa, Tomer Ullman

PDF

Open Access

TL;DR

This paper critically re-evaluates how Theory of Mind in large language models is assessed, highlighting conceptual issues and proposing clearer evaluation directions inspired by cognitive science.

Contribution

It clarifies the conceptual ambiguities in ToM evaluation for LLMs and discusses how current methods may deviate from true ToM measurement, proposing future research directions.

Findings

01

Current ToM evaluations are inconsistent and lack clarity.

02

Models may not be assessed against the correct behavioral or computational benchmarks.

03

Future research should explore ToM's relation to pragmatic communication.

Abstract

The question of whether large language models (LLMs) possess Theory of Mind (ToM) -- often defined as the ability to reason about others' mental states -- has sparked significant scientific and public interest. However, the evidence as to whether LLMs possess ToM is mixed, and the recent growth in evaluations has not resulted in a convergence. Here, we take inspiration from cognitive science to re-evaluate the state of ToM evaluation in LLMs. We argue that a major reason for the disagreement on whether LLMs have ToM is a lack of clarity on whether models should be expected to match human behaviors, or the computations underlying those behaviors. We also highlight ways in which current evaluations may be deviating from "pure" measurements of ToM abilities, which also contributes to the confusion. We conclude by discussing several directions for future research, including the relationship…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Language and cultural evolution · Embodied and Extended Cognition