Deep Interpretable Models of Theory of Mind
Ini Oguntola, Dana Hughes, Katia Sycara

TL;DR
This paper introduces an interpretable neural framework for modeling the mental states and intentions of others, improving prediction accuracy in human-AI interaction scenarios like search and rescue tasks in Minecraft.
Contribution
It presents a novel modular neural approach that enhances interpretability and predictive performance in modeling theory of mind in AI systems.
Findings
Interpretability improves prediction accuracy in modeling human intentions.
The framework effectively models internal mental states, not just external behavior.
Experimental results show significant performance gains in a Minecraft search and rescue task.
Abstract
When developing AI systems that interact with humans, it is essential to design both a system that can understand humans, and a system that humans can understand. Most deep network based agent-modeling approaches are 1) not interpretable and 2) only model external behavior, ignoring internal mental states, which potentially limits their capability for assistance, interventions, discovering false beliefs, etc. To this end, we develop an interpretable modular neural framework for modeling the intentions of other observed entities. We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft, and show that incorporating interpretability can significantly increase predictive performance under the right conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
