WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Xiaojie Xu, Zhengyuan Lin, Kang He, Yukang Feng, Xiaofeng Mao, Yuanyang Yin, Kaipeng Zhang, Yongtao Ge

TL;DR
WorldMark is a comprehensive benchmark suite for interactive video world models, providing standardized evaluation conditions, a unified control interface, and an online arena for model comparison.
Contribution
It introduces a unified action-mapping layer, a hierarchical test suite, and a modular evaluation toolkit for fair comparison of interactive video models.
Findings
Enabled apples-to-apples comparison across six models on identical scenes.
Provided a diverse set of 500 evaluation cases covering various viewpoints and scene types.
Launched an online platform for live model comparison and benchmarking.
Abstract
Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as trajectory error, aesthetic scores, and VLM-based judgments, but none supplies the standardized test conditions -- identical scenes, identical action sequences, and a unified control interface -- needed to make those metrics comparable across models with heterogeneous inputs. We introduce WorldMark, the first benchmark that provides such a common playing field for interactive Image-to-Video world models. WorldMark contributes: (1) a unified action-mapping layer that translates a shared WASD-style action vocabulary into each model's native control format, enabling apples-to-apples comparison…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
