Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning
Qin Zhang, Peiyu Jing, Hong-Xing Yu, Fangqiang Ding, Fan Nie, Weimin Wang, Yilun Du, James Zou, Jiajun Wu, Bing Shuai

TL;DR
Physion-Eval is a comprehensive benchmark that uses expert human reasoning to evaluate physical realism in generated videos, revealing significant physical glitches in current models and aiming to improve physics-grounded video generation.
Contribution
Introduces Physion-Eval, a large-scale dataset with expert annotations for diagnosing physical realism failures in generated videos, advancing evaluation methods beyond automated metrics.
Findings
83.3% of exocentric videos show physical glitches
93.5% of egocentric videos exhibit physical glitches
Physion-Eval sets a new standard for physical realism assessment
Abstract
Video generation models are increasingly used as world simulators for storytelling, simulation, and embodied AI. As these models advance, a key question arises: do generated videos obey the physical laws of the real world? Existing evaluations largely rely on automated metrics or coarse human judgments such as preferences or rubric-based checks. While useful for assessing perceptual quality, these methods provide limited insight into when and why generated dynamics violate real-world physical constraints. We introduce Physion-Eval, a large-scale benchmark of expert human reasoning for diagnosing physical realism failures in videos generated by five state-of-the-art models across egocentric and exocentric views, containing 10,990 expert reasoning traces spanning 22 fine-grained physical categories. Each generated video is derived from a corresponding real-world reference video depicting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
