WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Zhen Li; Zian Meng; Shuwei Shi; Wenshuo Peng; Yuwei Wu; Bo Zheng; Chuanhao Li; Kaipeng Zhang

arXiv:2603.23497·cs.CV·March 25, 2026

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Zhen Li, Zian Meng, Shuwei Shi, Wenshuo Peng, Yuwei Wu, Bo Zheng, Chuanhao Li, Kaipeng Zhang

PDF

Open Access

TL;DR

WildWorld is a large-scale, richly annotated dataset from a photorealistic game designed to advance dynamic world modeling with explicit states and actions, addressing limitations of previous datasets.

Contribution

The paper introduces WildWorld, a comprehensive dataset with diverse actions and explicit state annotations, enabling better learning of structured world dynamics in visual environments.

Findings

01

Models struggle with semantically rich actions.

02

Maintaining long-horizon state consistency remains challenging.

03

The dataset facilitates research in state-aware video generation.

Abstract

Dynamical systems theory and reinforcement learning view world evolution as latent-state dynamics driven by actions, with visual observations providing partial information about the state. Recent video world models attempt to learn this action-conditioned dynamics from data. However, existing datasets rarely match the requirement: they typically lack diverse and semantically meaningful action spaces, and actions are directly tied to visual observations rather than mediated by underlying states. As a result, actions are often entangled with pixel-level changes, making it difficult for models to learn structured world dynamics and maintain consistent evolution over long horizons. In this paper, we propose WildWorld, a large-scale action-conditioned world modeling dataset with explicit state annotations, automatically collected from a photorealistic AAA action role-playing game (Monster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Human Pose and Action Recognition