Silico-centric Theory of Mind
Anirban Mukherjee, Hannah Hanwen Chang

TL;DR
This paper explores the limitations of current AI in understanding and reasoning about other agents' mental states, revealing a discrepancy between AI's performance on human-centric ToM tasks and its ability to model other AI agents.
Contribution
The study introduces a novel 'silico-centric' ToM test for AI, highlighting the gap between AI's human-centric ToM performance and its capacity for higher-order reasoning about other AI agents.
Findings
AI performs well on human-centric ToM assessments.
AI erroneously creates instructions for clones, indicating a lack of true ToM.
Neither AI nor referee demonstrates genuine ToM in the proposed test.
Abstract
Theory of Mind (ToM) refers to the ability to attribute mental states, such as beliefs, desires, intentions, and knowledge, to oneself and others, and to understand that these mental states can differ from one's own and from reality. We investigate ToM in environments with multiple, distinct, independent AI agents, each possessing unique internal states, information, and objectives. Inspired by human false-belief experiments, we present an AI ('focal AI') with a scenario where its clone undergoes a human-centric ToM assessment. We prompt the focal AI to assess whether its clone would benefit from additional instructions. Concurrently, we give its clones the ToM assessment, both with and without the instructions, thereby engaging the focal AI in higher-order counterfactual reasoning akin to human mentalizing--with respect to humans in one test and to other AI in another. We uncover a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research and Philosophical Inquiry
