Quantitative Video World Model Evaluation for Geometric-Consistency
Jiaxin Wu, Yihao Pi, Yinling Zhang, Yuheng Li, Xueyan Zou

TL;DR
This paper introduces PDI-Bench, a quantitative framework for evaluating the geometric consistency of generated videos, addressing limitations of subjective assessments and revealing failure modes in state-of-the-art models.
Contribution
The authors develop a novel, objective evaluation method for geometric coherence in generated videos, including a new dataset and residual metrics for systematic analysis.
Findings
PDI-Bench uncovers geometry-specific failure modes in current video generators.
The framework reveals limitations of perceptual metrics in detecting geometric inconsistencies.
Code and dataset are publicly available for benchmarking and further research.
Abstract
Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and weakly diagnostic for geometric failures. We introduce PDI-Bench (Perspective Distortion Index), a quantitative framework for auditing geometric coherence in generated videos. Given a generated clip, we obtain object-centric observations via segmentation and point tracking (e.g., SAM 2, MegaSaM, and CoTracker3), lift them to 3D world-space coordinates via monocular reconstruction, and compute a set of projective-geometry residuals capturing three failure dimensions: scale-depth alignment, 3D motion consistency, and 3D structural rigidity. To support systematic evaluation, we build PDI-Dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
