4DWorldBench: A Comprehensive Evaluation Framework for 3D/4D World Generation Models

Yiting Lu; Wei Luo; Peiyan Tu; Haoran Li; Hanxin Zhu; Zihao Yu; Xingrui Wang; Xinyi Chen; Xinge Peng; Xin Li; Zhibo Chen

arXiv:2511.19836·cs.CV·November 26, 2025

4DWorldBench: A Comprehensive Evaluation Framework for 3D/4D World Generation Models

Yiting Lu, Wei Luo, Peiyan Tu, Haoran Li, Hanxin Zhu, Zihao Yu, Xingrui Wang, Xinyi Chen, Xinge Peng, Xin Li, Zhibo Chen

PDF

Open Access

TL;DR

This paper introduces 4DWorldBench, a comprehensive evaluation framework for 3D/4D world generation models that assesses perceptual quality, physical realism, and cross-modal coherence across various tasks and modalities.

Contribution

It presents a unified, adaptive benchmarking framework that extends traditional evaluation methods by integrating multiple modalities and using large language models as judges.

Findings

01

Preliminary human studies show closer alignment with subjective judgments.

02

The benchmark enables systematic comparison of world-generation models.

03

Adaptive conditioning improves evaluation consistency across modalities.

Abstract

World Generation Models are emerging as a cornerstone of next-generation multimodal intelligence systems. Unlike traditional 2D visual generation, World Models aim to construct realistic, dynamic, and physically consistent 3D/4D worlds from images, videos, or text. These models not only need to produce high-fidelity visual content but also maintain coherence across space, time, physics, and instruction control, enabling applications in virtual reality, autonomous driving, embodied intelligence, and content creation. However, prior benchmarks emphasize different evaluation dimensions and lack a unified assessment of world-realism capability. To systematically evaluate World Models, we introduce the 4DWorldBench, which measures models across four key dimensions: Perceptual Quality, Condition-4D Alignment, Physical Realism, and 4D Consistency. The benchmark covers tasks such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI