Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

Meiqi Wu; Zhixin Cai; Fufangchen Zhao; Xiaokun Feng; Rujing Dang; Bingze Song; Ruitian Tian; Jiashu Zhu; Jiachen Lei; Hao Dou; Jing Tang; Lei Sun; Jiahong Wu; Xiangxiang Chu; Zeming Liu; Kaiqi Huang

arXiv:2603.22212·cs.CV·March 24, 2026

Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

Meiqi Wu, Zhixin Cai, Fufangchen Zhao, Xiaokun Feng, Rujing Dang, Bingze Song, Ruitian Tian, Jiashu Zhu, Jiachen Lei, Hao Dou, Jing Tang, Lei Sun, Jiahong Wu, Xiangxiang Chu, Zeming Liu, Kaiqi Huang

PDF

Open Access

TL;DR

Omni-WorldBench introduces a comprehensive benchmark to evaluate the interactive response capabilities of 4D world models, addressing a critical gap in current evaluation methods by measuring how models respond to actions over space and time.

Contribution

The paper presents Omni-WorldBench, a novel benchmark with a prompt suite and agent-based metrics for systematically assessing interactive response in 4D world models.

Findings

01

Current models show limitations in interactive response capabilities.

02

Extensive evaluations reveal gaps in causal understanding of models.

03

Benchmark provides actionable insights for future improvements.

Abstract

Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentally neglect temporal dynamics. We argue that the future of world modeling lies in 4D generation, which jointly models spatial structure and temporal evolution. In this paradigm, the core capability is interactive response: the ability to faithfully reflect how interaction actions drive state transitions across space and time. Yet no existing benchmark systematically evaluates this critical dimension. To address this gap, we propose Omni--WorldBench, a comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models in 4D settings. Omni--WorldBench…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Social Robot Interaction and HRI · Human Pose and Action Recognition