TL;DR
OSCToM introduces a reinforcement learning-based method for modeling nested belief conflicts in language models, significantly improving performance on complex Theory of Mind tasks with more efficient data synthesis.
Contribution
The paper presents OSCToM, a novel RL-guided approach combining domain-specific language and surrogate models to enhance recursive ToM reasoning in LLMs.
Findings
OSCToM-8B outperforms existing systems on ToM benchmarks.
Achieves 76% accuracy on FANToM, surpassing prior methods.
Data-synthesis is 6x more efficient, aiding smaller models.
Abstract
Large Language Models (LLMs) perform well on many language tasks, but their Theory of Mind (ToM) reasoning is still uneven in complex social settings. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult. This paper presents OSCToM (Observer-Self Conflict Theory of Mind), an approach for modeling nested belief conflicts in LLM-based ToM tasks. The key case is one in which an observer's view of another agent conflicts with the observer's own belief state. Such cases go beyond simple perspective-taking and require recursive, multi-layered reasoning. OSCToM combines reinforcement learning (RL), an extended domain-specific language, and compositional surrogate models to generate observer-self conflicts. In our experiments, OSCToM-8B gives the best overall result among the systems tested. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
