WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning
Delong Chen, Willy Chung, Yejin Bang, Ziwei Ji, Pascale Fung

TL;DR
WorldPrediction introduces a novel video-based benchmark for evaluating high-level world modeling and procedural planning in AI, emphasizing semantic and temporal actions, and highlights current models' limitations compared to human performance.
Contribution
The paper presents the first benchmark focusing on high-level, semantic, and temporal world modeling and planning, with a formal framework and extensive validation.
Findings
Current models achieve only 57% accuracy on world modeling.
Models score 38% on procedural planning tasks.
Humans solve both tasks perfectly.
Abstract
Humans are known to have an internal "world model" that enables us to carry out action planning based on world states. AI agents need to have such a world model for action planning as well. It is not clear how current AI models, especially generative models, are able to learn such world models and carry out procedural planning in diverse environments. We introduce WorldPrediction, a video-based benchmark for evaluating world modeling and procedural planning capabilities of different AI models. In contrast to prior benchmarks that focus primarily on low-level world modeling and robotic motion planning, WorldPrediction is the first benchmark that emphasizes actions with temporal and semantic abstraction. Given initial and final world states, the task is to distinguish the proper action (WorldPrediction-WM) or the properly ordered sequence of actions (WorldPrediction-PP) from a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · AI-based Problem Solving and Planning
MethodsSparse Evolutionary Training · Focus
