4DWorldBench: A Comprehensive Evaluation Framework for 3D/4D World Generation Models
Yiting Lu, Wei Luo, Peiyan Tu, Haoran Li, Hanxin Zhu, Zihao Yu, Xingrui Wang, Xinyi Chen, Xinge Peng, Xin Li, Zhibo Chen

TL;DR
This paper introduces 4DWorldBench, a comprehensive evaluation framework for 3D/4D world generation models that assesses perceptual quality, physical realism, and cross-modal coherence across various tasks and modalities.
Contribution
It presents a unified, adaptive benchmarking framework that extends traditional evaluation methods by integrating multiple modalities and using large language models as judges.
Findings
Preliminary human studies show closer alignment with subjective judgments.
The benchmark enables systematic comparison of world-generation models.
Adaptive conditioning improves evaluation consistency across modalities.
Abstract
World Generation Models are emerging as a cornerstone of next-generation multimodal intelligence systems. Unlike traditional 2D visual generation, World Models aim to construct realistic, dynamic, and physically consistent 3D/4D worlds from images, videos, or text. These models not only need to produce high-fidelity visual content but also maintain coherence across space, time, physics, and instruction control, enabling applications in virtual reality, autonomous driving, embodied intelligence, and content creation. However, prior benchmarks emphasize different evaluation dimensions and lack a unified assessment of world-realism capability. To systematically evaluate World Models, we introduce the 4DWorldBench, which measures models across four key dimensions: Perceptual Quality, Condition-4D Alignment, Physical Realism, and 4D Consistency. The benchmark covers tasks such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI
