Phy-Q as a measure for physical reasoning intelligence

Cheng Xue; Vimukthini Pinto; Chathura Gamage; Ekaterina Nikonova; Peng; Zhang; Jochen Renz

arXiv:2108.13696·cs.AI·January 30, 2023

Phy-Q as a measure for physical reasoning intelligence

Cheng Xue, Vimukthini Pinto, Chathura Gamage, Ekaterina Nikonova, Peng, Zhang, Jochen Renz

PDF

Open Access 1 Repo

TL;DR

This paper introduces Phy-Q, a new benchmark for measuring physical reasoning intelligence in AI agents through diverse scenarios, revealing current agents' significant performance gap compared to humans.

Contribution

The paper presents a novel testbed with physical scenarios and a scoring metric, enabling evaluation of physical reasoning and generalization in AI agents.

Findings

01

All tested agents perform below human levels.

02

Learning agents struggle with physical rule generalization.

03

Current agents show limited physical reasoning capabilities.

Abstract

Humans are well-versed in reasoning about the behaviors of physical objects and choosing actions accordingly to accomplish tasks, while it remains a major challenge for AI. To facilitate research addressing this problem, we propose a new testbed that requires an agent to reason about physical scenarios and take an action appropriately. Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios. We create a wide variety of distinct task templates, and we ensure all the task templates within the same scenario can be solved by using one specific strategic physical rule. By having such a design, we evaluate two distinct levels of generalization, namely the local generalization and the broad generalization. We conduct an extensive evaluation with human players, learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phy-q/benchmark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning