HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

HunyuanWorld Team; Zhenwei Wang; Yuhao Liu; Junta Wu; Zixiao Gu; Haoyuan Wang; Xuhui Zuo; Tianyu Huang; Wenhuan Li; Sheng Zhang; Yihang Lian; Yulin Tsai; Lifu Wang; Sicong Liu; Puhua Jiang; Xianghui Yang; Dongyuan Guo; Yixuan Tang; Xinyue Mao; Jiaao Yu; Junlin Yu; Jihong Zhang; Meng Chen; Liang Dong; Yiwen Jia; Chao Zhang; Yonghao Tan; Hao Zhang; Zheng Ye; Peng He; Runzhou Wu; Minghui Chen; Zhan Li; Wangchen Qin; Lei Wang; Yifu Sun; Lin Niu; Xiang Yuan; Xiaofeng Yang; Yingping He; Jie Xiao; Yangyu Tao; Jianchen Zhu; Jinbao Xue; Kai Liu; Chongqing Zhao; Xinming Wu; Tian Liu; Peng Chen; Di Wang; Yuhong Liu; Linus; Jie Jiang; Tengfei Wang; Chunchao Guo

arXiv:2507.21809·cs.CV·August 14, 2025

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, Junta Wu, Zixiao Gu, Haoyuan Wang, Xuhui Zuo, Tianyu Huang, Wenhuan Li, Sheng Zhang, Yihang Lian, Yulin Tsai, Lifu Wang, Sicong Liu, Puhua Jiang, Xianghui Yang, Dongyuan Guo, Yixuan Tang, Xinyue Mao, Jiaao Yu, Junlin Yu, Jihong Zhang

PDF

3 Models

TL;DR

HunyuanWorld 1.0 introduces a novel framework that generates immersive, explorable 3D worlds from text and images by combining panoramic proxies with layered mesh representations, enabling high-quality, interactive virtual environments.

Contribution

It presents a semantically layered 3D mesh approach using panoramic images, achieving state-of-the-art 3D world generation from textual and visual inputs.

Findings

01

Achieves state-of-the-art results in 3D world coherence and interactivity.

02

Supports seamless mesh export for graphics pipelines.

03

Enables diverse applications in VR, gaming, and content creation.

Abstract

Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360{\deg} immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.