ToMATO: Verbalizing the Mental States of Role-Playing LLMs for   Benchmarking Theory of Mind

Kazutoshi Shinoda; Nobukatsu Hojo; Kyosuke Nishida; Saki Mizuno; Keita; Suzuki; Ryo Masumura; Hiroaki Sugiyama; Kuniko Saito

arXiv:2501.08838·cs.CL·January 16, 2025

ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind

Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Saki Mizuno, Keita, Suzuki, Ryo Masumura, Hiroaki Sugiyama, Kuniko Saito

PDF

Open Access 1 Repo 1 Video

TL;DR

ToMATO is a novel benchmark for evaluating the Theory of Mind in language models, emphasizing diverse mental states, false beliefs, and personality traits through role-playing conversations and verbalized thoughts.

Contribution

It introduces a new dataset and evaluation method that captures a wide range of mental states and personality influences in LLMs, addressing limitations of existing ToM benchmarks.

Findings

01

LLMs struggle with false beliefs and personality diversity.

02

GPT-4o mini performs below human levels in ToM tasks.

03

The dataset reveals frequent false beliefs due to information asymmetry.

Abstract

Existing Theory of Mind (ToM) benchmarks diverge from real-world scenarios in three aspects: 1) they assess a limited range of mental states such as beliefs, 2) false beliefs are not comprehensively explored, and 3) the diverse personality traits of characters are overlooked. To address these challenges, we introduce ToMATO, a new ToM benchmark formulated as multiple-choice QA over conversations. ToMATO is generated via LLM-LLM conversations featuring information asymmetry. By employing a prompting method that requires role-playing LLMs to verbalize their thoughts before each utterance, we capture both first- and second-order mental states across five categories: belief, intention, desire, emotion, and knowledge. These verbalized thoughts serve as answers to questions designed to assess the mental states of characters within conversations. Furthermore, the information asymmetry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nttmdlab-nlp/ToMATO
pytorchOfficial

Videos

ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind· underline

Taxonomy

TopicsComplex Systems and Decision Making · Cognitive Science and Mapping · Artificial Intelligence in Law