Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning
Melanie Sclar, Jane Yu, Maryam Fazel-Zarandi, Yulia Tsvetkov, Yonatan, Bisk, Yejin Choi, Asli Celikyilmaz

TL;DR
ExploreToM is a novel framework that generates complex, diverse theory of mind data to evaluate and improve large language models' social reasoning abilities, revealing significant gaps in current models.
Contribution
The paper introduces ExploreToM, the first large-scale data generation framework for robust theory of mind evaluation and training of LLMs, using A* search over a domain-specific language.
Findings
State-of-the-art LLMs perform poorly on ExploreToM data, with accuracies as low as 0% and 9%.
Fine-tuning on ExploreToM data improves ToMi benchmark accuracy by 27 points.
ExploreToM uncovers key missing skills like unreliable state tracking and data imbalances in models.
Abstract
Do large language models (LLMs) have theory of mind? A plethora of papers and benchmarks have been introduced to evaluate if current models have been able to develop this key ability of social intelligence. However, all rely on limited datasets with simple patterns that can potentially lead to problematic blind spots in evaluation and an overestimation of model capabilities. We introduce ExploreToM, the first framework to allow large-scale generation of diverse and challenging theory of mind data for robust training and evaluation. Our approach leverages an A* search over a custom domain-specific language to produce complex story structures and novel, diverse, yet plausible scenarios to stress test the limits of LLMs. Our evaluation reveals that state-of-the-art LLMs, such as Llama-3.1-70B and GPT-4o, show accuracies as low as 0% and 9% on ExploreToM-generated data, highlighting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
