Hi-Phy: A Benchmark for Hierarchical Physical Reasoning
Cheng Xue, Vimukthini Pinto, Chathura Gamage, Peng Zhang, Jochen, Renz

TL;DR
Hi-Phy is a new benchmark designed to evaluate and measure granular physical reasoning capabilities in agents, using tasks inspired by human physical reasoning hierarchies in the game Angry Birds.
Contribution
The paper introduces a hierarchical physical reasoning benchmark that assesses specific reasoning capabilities, enabling detailed evaluation of AI agents' physical understanding.
Findings
Learning agents struggle with physical reasoning compared to heuristic agents and humans.
The benchmark reveals limitations in current AI agents' ability to generalize physical reasoning skills.
Humans outperform AI agents in complex physical reasoning tasks.
Abstract
Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds. Humans are very experienced in physical reasoning while it remains a major challenge for AI. To facilitate research addressing this problem, several benchmarks have been proposed recently. However, these benchmarks do not enable us to measure an agent's granular physical reasoning capabilities when solving a complex reasoning task. In this paper, we propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities. Inspired by how humans acquire these capabilities, we propose a general hierarchy of physical reasoning capabilities with increasing complexity. Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds. This benchmark enables us to conduct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Topic Modeling
