HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Team HY-World; Chenjie Cao; Xuhui Zuo; Zhenwei Wang; Yisu Zhang; Junta Wu; Zhenyang Liu; Yuning Gong; Yang Liu; Bo Yuan; Chao Zhang; Coopers Li; Dongyuan Guo; Fan Yang; Haiyu Zhang; Hang Cao; Jianchen Zhu; Jiaxin Lin; Jie Xiao; Jihong Zhang; Junlin Yu; Lei Wang; Lifu Wang; Lilin Wang; Linus; Minghui Chen; Peng He; Penghao Zhao; Qi Chen; Rui Chen; Rui Shao; Sicong Liu; Wangchen Qin; Xiaochuan Niu; Xiang Yuan; Yi Sun; Yifei Tang; Yifu Sun; Yihang Lian; Yonghao Tan; Yuhong Liu; Yuyang Yin; Zhiyuan Min; Tengfei Wang; Chunchao Guo

arXiv:2604.14268·cs.CV·April 17, 2026

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang

PDF

1 Repo 1 Models

TL;DR

HY-World 2.0 is a comprehensive multi-modal framework that generates, reconstructs, and simulates detailed 3D worlds from diverse inputs, advancing state-of-the-art in open-source 3D scene modeling.

Contribution

It introduces new modules and improvements for multi-modal 3D world understanding, generation, and rendering, with a flexible platform supporting interactive exploration.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Produces high-fidelity, navigable 3D scenes from various input modalities.

03

Provides open-source code and models for reproducibility.

Abstract

We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the model performs world generation, synthesizing high-fidelity, navigable 3D Gaussian Splatting (3DGS) scenes. This is achieved through a four-stage method: a) Panorama Generation with HY-Pano 2.0, b) Trajectory Planning with WorldNav, c) World Expansion with WorldStereo 2.0, and d) World Composition with WorldMirror 2.0. Specifically, we introduce key innovations to enhance panorama fidelity, enable 3D scene understanding and planning, and upgrade WorldStereo, our keyframe-based view generation model with consistent memory. We also upgrade WorldMirror, a feed-forward model for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencent-hunyuan/HY-World-2.0
github

Models

🤗
tencent/HY-World-2.0
model· 2.8k dl· ♡ 655
2.8k dl♡ 655

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.