Yume-1.5: A Text-Controlled Interactive World Generation Model

Xiaofeng Mao; Zhen Li; Chuanhao Li; Xiaojie Xu; Kaining Ying; Tong He; Jiangmiao Pang; Yu Qiao; Kaipeng Zhang

arXiv:2512.22096·cs.CV·December 29, 2025

Yume-1.5: A Text-Controlled Interactive World Generation Model

Xiaofeng Mao, Zhen Li, Chuanhao Li, Xiaojie Xu, Kaining Ying, Tong He, Jiangmiao Pang, Yu Qiao, Kaipeng Zhang

PDF

Open Access 1 Models

TL;DR

Yume-1.5 introduces a real-time, text-controlled world generation framework that overcomes previous diffusion model limitations by integrating context compression, streaming acceleration, and interactive exploration.

Contribution

It presents a novel framework combining unified context compression, real-time streaming, and text control for interactive world generation from images or text.

Findings

01

Supports real-time, interactive world exploration

02

Achieves realistic world generation from minimal prompts

03

Enables text-controlled world event creation

Abstract

Recent approaches have demonstrated the promise of using diffusion models to generate interactive and explorable worlds. However, most of these methods face critical challenges such as excessively large parameter sizes, reliance on lengthy inference steps, and rapidly growing historical context, which severely limit real-time performance and lack text-controlled generation capabilities. To address these challenges, we propose \method, a novel framework designed to generate realistic, interactive, and continuous worlds from a single image or text prompt. \method achieves this through a carefully designed framework that supports keyboard-based exploration of the generated worlds. The framework comprises three core components: (1) a long-video generation framework integrating unified context compression with linear attention; (2) a real-time streaming acceleration strategy powered by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
stdstu123/Yume-5B-720P
model· 216 dl· ♡ 91
216 dl♡ 91

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation