Hi-Phy: A Benchmark for Hierarchical Physical Reasoning

Cheng Xue; Vimukthini Pinto; Chathura Gamage; Peng Zhang; Jochen; Renz

arXiv:2106.09692·cs.AI·August 31, 2021

Hi-Phy: A Benchmark for Hierarchical Physical Reasoning

Cheng Xue, Vimukthini Pinto, Chathura Gamage, Peng Zhang, Jochen, Renz

PDF

Open Access 1 Repo

TL;DR

Hi-Phy is a new benchmark designed to evaluate and measure granular physical reasoning capabilities in agents, using tasks inspired by human physical reasoning hierarchies in the game Angry Birds.

Contribution

The paper introduces a hierarchical physical reasoning benchmark that assesses specific reasoning capabilities, enabling detailed evaluation of AI agents' physical understanding.

Findings

01

Learning agents struggle with physical reasoning compared to heuristic agents and humans.

02

The benchmark reveals limitations in current AI agents' ability to generalize physical reasoning skills.

03

Humans outperform AI agents in complex physical reasoning tasks.

Abstract

Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds. Humans are very experienced in physical reasoning while it remains a major challenge for AI. To facilitate research addressing this problem, several benchmarks have been proposed recently. However, these benchmarks do not enable us to measure an agent's granular physical reasoning capabilities when solving a complex reasoning task. In this paper, we propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities. Inspired by how humans acquire these capabilities, we propose a general hierarchy of physical reasoning capabilities with increasing complexity. Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds. This benchmark enables us to conduct a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cheng-Xue/Hi-Phy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Topic Modeling