Evaluating Theory of Mind in Question Answering
Aida Nematzadeh, Kaylee Burns, Erin Grant, Alison Gopnik and, Thomas L. Griffiths

TL;DR
This paper introduces a new dataset to evaluate question answering models' ability to reason about beliefs, inspired by theory-of-mind experiments, revealing current models' limitations in handling inconsistent world states.
Contribution
The paper presents a novel dataset for belief reasoning in question answering and assesses neural models, highlighting their failure to handle belief inconsistencies.
Findings
Models fail on belief reasoning tasks
Model accuracy drops with random sentences
Current neural models struggle with inconsistent states
Abstract
We propose a new dataset for evaluating question answering models with respect to their capacity to reason about beliefs. Our tasks are inspired by theory-of-mind experiments that examine whether children are able to reason about the beliefs of others, in particular when those beliefs differ from reality. We evaluate a number of recent neural models with memory augmentation. We find that all fail on our tasks, which require keeping track of inconsistent states of the world; moreover, the models' accuracy decreases notably when random sentences are introduced to the tasks at test.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
