WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
Keming Wu, Yijing Cui, Wenhan Xue, Qijie Wang, Xuan Luo, Zhiyuan Feng, Zuhao Yang, Sudong Wang, Sicong Jiang, Haowei Zhu, Zihan Wang, Ping Nie, Wenhu Chen, Bin Wang

TL;DR
WorldReasonBench introduces a comprehensive benchmark for evaluating whether video generators can accurately reason about world dynamics, focusing on physical, social, logical, and informational consistency.
Contribution
It presents a new benchmark with structured QA annotations and a two-part human-aligned evaluation methodology for assessing world reasoning in video generation models.
Findings
Modern video generators often lack accurate world reasoning despite visual plausibility.
The benchmark reveals persistent gaps in causality, dynamics, and information preservation in generated videos.
Evaluation toolkit and benchmarks will be publicly released to advance research in world-aware video generation.
Abstract
Commercial video generation systems such as Seedance2.0 and Veo3.1 have rapidly improved, strengthening the view that video generators may be evolving into "world simulators." Yet the community still lacks a benchmark that directly tests whether a model can reason about how an observed world should evolve over time. We introduce WorldReasonBench, which reframes video generation evaluation as world-state prediction: given an initial state and an action, can a model generate a future video whose state evolution remains physically, socially, logically, and informationally consistent? WorldReasonBench contains 436 curated test cases with structured ground-truth QA annotations spanning four reasoning dimensions and 22 subcategories. We evaluate generated videos with a human-aligned two-part methodology: Process-aware Reasoning Verification uses structured QA and reasoning-phase diagnostics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
