A Systematic Review on the Evaluation of Large Language Models in Theory   of Mind Tasks

Karahan Sar{\i}ta\c{s}; K{\i}van\c{c} Tez\"oren; Yavuz Durmazkeser

arXiv:2502.08796·cs.CL·February 14, 2025

A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks

Karahan Sar{\i}ta\c{s}, K{\i}van\c{c} Tez\"oren, Yavuz Durmazkeser

PDF

Open Access 1 Repo

TL;DR

This systematic review analyzes how large language models are evaluated on Theory of Mind tasks, highlighting current methodologies, challenges, and the emerging but incomplete capabilities of LLMs in mental state reasoning.

Contribution

It provides a comprehensive taxonomy of ToM evaluation benchmarks and critically assesses the limitations and progress of LLMs in replicating human-like mental state understanding.

Findings

01

LLMs show emerging competence in ToM tasks

02

Significant gaps remain in LLMs' ability to emulate human mental reasoning

03

Evaluation techniques vary widely and have inherent limitations

Abstract

In recent years, evaluating the Theory of Mind (ToM) capabilities of large language models (LLMs) has received significant attention within the research community. As the field rapidly evolves, navigating the diverse approaches and methodologies has become increasingly complex. This systematic review synthesizes current efforts to assess LLMs' ability to perform ToM tasks, an essential aspect of human cognition involving the attribution of mental states to oneself and others. Despite notable advancements, the proficiency of LLMs in ToM remains a contentious issue. By categorizing benchmarks and tasks through a taxonomy rooted in cognitive science, this review critically examines evaluation techniques, prompting strategies, and the inherent limitations of LLMs in replicating human-like mental state reasoning. A recurring theme in the literature reveals that while LLMs demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mars-tin/awesome-theory-of-mind
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling

MethodsSoftmax · Attention Is All You Need