iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework

Jianjie Fang; Yingshan Lei; Qin Wan; Ziyou Wang; Yuchao Huang; Yongyan Xu; Baining Zhao; Weichen Zhang; Chen Gao; Xinlei Chen; Yong Li

arXiv:2605.03941·cs.CV·May 7, 2026

iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework

Jianjie Fang, Yingshan Lei, Qin Wan, Ziyou Wang, Yuchao Huang, Yongyan Xu, Baining Zhao, Weichen Zhang, Chen Gao, Xinlei Chen, Yong Li

PDF

1 Repo 1 Datasets

TL;DR

iWorld-Bench introduces a comprehensive benchmark with a large dataset and unified evaluation framework for assessing interactive world models' perception, memory, and action capabilities in diverse scenarios.

Contribution

The paper presents iWorld-Bench, a new large-scale dataset and a unified action generation framework for evaluating interactive world models across multiple tasks.

Findings

01

Evaluated 14 world models revealing key limitations.

02

Constructed a dataset with 330k video clips and 4.9k test samples.

03

Provided insights for future development of interactive world models.

Abstract

Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their physical interaction capabilities. To address this, we propose iWorld-Bench, a comprehensive benchmark for training and testing world models on interaction-related abilities such as distance perception and memory. We construct a diverse dataset with 330k video clips and select 2.1k high-quality samples covering varied perspectives, weather, and scenes. As existing world models differ in interaction modalities, we introduce an Action Generation Framework to unify evaluation and design six task types, generating 4.9k test samples. These tasks jointly assess model performance across visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Datasets

EmbodiedCity/iWorld-Bench-Dataset
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.