A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks
Karahan Sar{\i}ta\c{s}, K{\i}van\c{c} Tez\"oren, Yavuz Durmazkeser

TL;DR
This systematic review analyzes how large language models are evaluated on Theory of Mind tasks, highlighting current methodologies, challenges, and the emerging but incomplete capabilities of LLMs in mental state reasoning.
Contribution
It provides a comprehensive taxonomy of ToM evaluation benchmarks and critically assesses the limitations and progress of LLMs in replicating human-like mental state understanding.
Findings
LLMs show emerging competence in ToM tasks
Significant gaps remain in LLMs' ability to emulate human mental reasoning
Evaluation techniques vary widely and have inherent limitations
Abstract
In recent years, evaluating the Theory of Mind (ToM) capabilities of large language models (LLMs) has received significant attention within the research community. As the field rapidly evolves, navigating the diverse approaches and methodologies has become increasingly complex. This systematic review synthesizes current efforts to assess LLMs' ability to perform ToM tasks, an essential aspect of human cognition involving the attribution of mental states to oneself and others. Despite notable advancements, the proficiency of LLMs in ToM remains a contentious issue. By categorizing benchmarks and tasks through a taxonomy rooted in cognitive science, this review critically examines evaluation techniques, prompting strategies, and the inherent limitations of LLMs in replicating human-like mental state reasoning. A recurring theme in the literature reveals that while LLMs demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling
MethodsSoftmax · Attention Is All You Need
