iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework
Jianjie Fang, Yingshan Lei, Qin Wan, Ziyou Wang, Yuchao Huang, Yongyan Xu, Baining Zhao, Weichen Zhang, Chen Gao, Xinlei Chen, Yong Li

TL;DR
iWorld-Bench introduces a comprehensive benchmark with a large dataset and unified evaluation framework for assessing interactive world models' perception, memory, and action capabilities in diverse scenarios.
Contribution
The paper presents iWorld-Bench, a new large-scale dataset and a unified action generation framework for evaluating interactive world models across multiple tasks.
Findings
Evaluated 14 world models revealing key limitations.
Constructed a dataset with 330k video clips and 4.9k test samples.
Provided insights for future development of interactive world models.
Abstract
Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their physical interaction capabilities. To address this, we propose iWorld-Bench, a comprehensive benchmark for training and testing world models on interaction-related abilities such as distance perception and memory. We construct a diverse dataset with 330k video clips and select 2.1k high-quality samples covering varied perspectives, weather, and scenes. As existing world models differ in interaction modalities, we introduce an Action Generation Framework to unify evaluation and design six task types, generating 4.9k test samples. These tasks jointly assess model performance across visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
