Gravity-Bench-v1: A Benchmark on Gravitational Physics Discovery for Agents
Nolan Koblischke, Hyunseok Jang, Kristen Menou, Mohamad Ali-Dib

TL;DR
Gravity-Bench-v1 is a comprehensive environment-based benchmark designed to evaluate AI agents' ability to discover and reason about gravitational physics, including out-of-distribution scenarios, through dynamic simulations and data analysis tasks.
Contribution
It introduces a novel benchmark environment for testing AI scientific discovery in gravitational physics, including out-of-distribution cases and open-ended solutions.
Findings
Baseline AI agents find the benchmark challenging.
Out-of-distribution physics tests generalization.
Benchmark provides calibration against human expertise.
Abstract
Modern science emerged from reasoning over repeatedly-observed planetary motions. We present Gravity-Bench-v1, an environment-based benchmark that challenges AI agents on tasks that parallel this historical development. Gravity-Bench-v1 evaluates agents on the discovery of physics concealed within a dynamic environment, using rigorous gravitational dynamics simulations. Gravity-Bench includes out-of-distribution cases, i.e. with physics that deviates from the real world, to evaluate true scientific generalization capabilities. Agents must plan to collect data within an experimental budget and must perform a dynamic form of data analysis and reasoning to solve tasks efficiently. Our benchmark admits an open-ended space of solutions. Reference solutions for each task are provided to calibrate AI performance against human expertise. Technically at an upper-undergraduate level, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Distributed and Parallel Computing Systems · Algorithms and Data Compression
