WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Wenqiang Sun; Haiyu Zhang; Haoyuan Wang; Junta Wu; Zehan Wang; Zhenwei Wang; Yunhong Wang; Jun Zhang; Tengfei Wang; Chunchao Guo

arXiv:2512.14614·cs.CV·December 17, 2025

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Wenqiang Sun, Haiyu Zhang, Haoyuan Wang, Junta Wu, Zehan Wang, Zhenwei Wang, Yunhong Wang, Jun Zhang, Tengfei Wang, Chunchao Guo

PDF

Open Access 1 Models

TL;DR

WorldPlay introduces a real-time streaming video diffusion model that maintains long-term geometric consistency for interactive world modeling, balancing speed and memory through innovative memory and control techniques.

Contribution

The paper presents novel methods for long-term geometric consistency and real-time performance in interactive world modeling, including a dual action representation, reconstituted context memory, and context forcing distillation.

Findings

01

Generates 720p streaming video at 24 FPS with high consistency

02

Outperforms existing methods in long-term geometric accuracy

03

Demonstrates strong generalization across diverse scenes

Abstract

This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. WorldPlay draws power from three key innovations. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We also propose Context Forcing, a novel distillation method designed for memory-aware model. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tencent/HY-WorldPlay
model· 1.3k dl· ♡ 328
1.3k dl♡ 328

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging