World Reasoning Arena

PAN Team Institute of Foundation Models: Qiyue Gao; Kun Zhou; Jiannan Xiang; Zihan Liu; Dequan Yang; Junrong Chen; Arif Ahmad; Cong Zeng; Ganesh Bannur; Xinqi Huang; Zheqi Liu; Yi Gu; Yichi Yang; Guangyi Liu; Zhiting Hu; Zhengzhong Liu; Eric Xing

arXiv:2603.25887·cs.CV·March 30, 2026

World Reasoning Arena

PAN Team Institute of Foundation Models: Qiyue Gao, Kun Zhou, Jiannan Xiang, Zihan Liu, Dequan Yang, Junrong Chen, Arif Ahmad, Cong Zeng, Ganesh Bannur, Xinqi Huang, Zheqi Liu, Yi Gu, Yichi Yang, Guangyi Liu, Zhiting Hu, Zhengzhong Liu, Eric Xing

PDF

1 Repo

TL;DR

WR-Arena is a new comprehensive benchmark for world models that evaluates their ability to simulate, reason, and plan in complex environments, addressing limitations of existing benchmarks.

Contribution

The paper introduces WR-Arena, a diverse benchmark with datasets and tasks to evaluate world models' simulation fidelity, long-horizon forecasting, and reasoning capabilities.

Findings

01

Current models show a significant gap compared to human-level reasoning.

02

WR-Arena exposes weaknesses in existing world models across multiple dimensions.

03

The benchmark guides future development of more robust and capable world models.

Abstract

World models (WMs) are intended to serve as internal simulators of the real world that enable agents to understand, anticipate, and act upon complex environments. Existing WM benchmarks remain narrowly focused on next-state prediction and visual fidelity, overlooking the richer simulation capabilities required for intelligent behavior. To address this gap, we introduce WR-Arena, a comprehensive benchmark for evaluating WMs along three fundamental dimensions of next world simulation: (i) Action Simulation Fidelity, the ability to interpret and follow semantically meaningful, multi-step instructions and generate diverse counterfactual rollouts; (ii) Long-horizon Forecast, the ability to sustain accurate, coherent, and physically plausible simulations across extended interactions; and (iii) Simulative Reasoning and Planning, the ability to support goal-directed reasoning by simulating,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MBZUAI-IFM/WR-Arena
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.