GPT-4o Lacks Core Features of Theory of Mind
John Muchovej, Amanda Royka, Shane Lee, and Julian Jara-Ettinger

TL;DR
This paper critically examines whether GPT-4o and similar LLMs truly possess a Theory of Mind, revealing they lack a coherent, consistent mental model despite performing well on some social tasks.
Contribution
It introduces a new evaluation framework based on cognitive definitions of ToM to test the internal mental models of LLMs.
Findings
LLMs succeed in simple ToM tasks but fail in logically equivalent tests.
LLMs show low consistency between predicted actions and mental state inferences.
Social proficiency in LLMs does not stem from a genuine, domain-general ToM.
Abstract
Do Large Language Models (LLMs) possess a Theory of Mind (ToM)? Research into this question has focused on evaluating LLMs against benchmarks and found success across a range of social tasks. However, these evaluations do not test for the actual representations posited by ToM: namely, a causal model of mental states and behavior. Here, we use a cognitively-grounded definition of ToM to develop and test a new evaluation framework. Specifically, our approach probes whether LLMs have a coherent, domain-general, and consistent model of how mental states cause behavior -- regardless of whether that model matches a human-like ToM. We find that even though LLMs succeed in approximating human judgments in a simple ToM paradigm, they fail at a logically equivalent task and exhibit low consistency between their action predictions and corresponding mental state inferences. As such, these findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Embodied and Extended Cognition · Explainable Artificial Intelligence (XAI)
