AGENT: A Benchmark for Core Psychological Reasoning

Tianmin Shu; Abhishek Bhandwaldar; Chuang Gan; Kevin A. Smith; Shari; Liu; Dan Gutfreund; Elizabeth Spelke; Joshua B. Tenenbaum; Tomer D. Ullman

arXiv:2102.12321·cs.AI·July 27, 2021·1 cites

AGENT: A Benchmark for Core Psychological Reasoning

Tianmin Shu, Abhishek Bhandwaldar, Chuang Gan, Kevin A. Smith, Shari, Liu, Dan Gutfreund, Elizabeth Spelke, Joshua B. Tenenbaum, Tomer D. Ullman

PDF

Open Access 1 Video

TL;DR

This paper introduces AGENT, a benchmark dataset designed to evaluate machine agents' understanding of core psychological reasoning, inspired by human intuitive psychology, through diverse 3D animation scenarios.

Contribution

It presents a new benchmark with a comprehensive dataset and evaluation protocol to assess AI models' ability to reason about human mental states and actions.

Findings

01

Models need to incorporate utility and physics knowledge to pass tests.

02

Current baselines show room for improvement in core psychological reasoning.

03

AGENT correlates well with human judgments on psychological reasoning tasks.

Abstract

For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that drive observable actions, comes naturally to people: even pre-verbal infants can tell agents from objects, expecting agents to act efficiently to achieve goals given constraints. Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning. Inspired by cognitive development studies on intuitive psychology, we present a benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios (goal preferences, action efficiency, unobserved constraints, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AGENT: A Benchmark for Core Psychological Reasoning· slideslive

Taxonomy

TopicsChild and Animal Learning Development · Reinforcement Learning in Robotics · Social Robot Interaction and HRI