UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs
Prameshwar Thiyagarajan, Vaishnavi Parimi, Shamant Sai, Soumil Garg, Zhangir Meirbek, Nitin Yarlagadda, Kevin Zhu, Chris Kim

TL;DR
UniToMBench is a new benchmark integrating multi-interaction scenarios and perspective-taking techniques to systematically evaluate and improve Theory of Mind capabilities in large language models, revealing current strengths and limitations.
Contribution
The paper introduces UniToMBench, a comprehensive benchmark combining existing tools and new datasets to enhance and assess ToM in LLMs more effectively.
Findings
GPT-4o and GPT-4o Mini achieve over 80% accuracy in emotional and belief scenarios.
Performance varies significantly across knowledge-based ToM tasks.
UniToMBench provides a versatile platform for future LLM ToM development.
Abstract
Theory of Mind (ToM), the ability to understand the mental states of oneself and others, remains a challenging area for large language models (LLMs), which often fail to predict human mental states accurately. In this paper, we introduce UniToMBench, a unified benchmark that integrates the strengths of SimToM and TOMBENCH to systematically improve and assess ToM capabilities in LLMs by integrating multi-interaction task designs and evolving story scenarios. Supported by a custom dataset of over 1,000 hand-written scenarios, UniToMBench combines perspective-taking techniques with diverse evaluation metrics to better stimulate social cognition in LLMs. Through evaluation, we observe that while models like GPT-4o and GPT-4o Mini show consistently high accuracy in tasks involving emotional and belief-related scenarios, with results usually above 80%, there is significant variability in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsChild and Animal Learning Development · Face Recognition and Perception · Embodied and Extended Cognition
