Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Tianyu Huang; Wangguandong Zheng; Tengfei Wang; Yuhao Liu; Zhenwei Wang; Junta Wu; Jie Jiang; Hui Li; Rynson W.H. Lau; Wangmeng Zuo; Chunchao Guo

arXiv:2506.04225·cs.CV·June 5, 2025

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Tianyu Huang, Wangguandong Zheng, Tengfei Wang, Yuhao Liu, Zhenwei Wang, Junta Wu, Jie Jiang, Hui Li, Rynson W.H. Lau, Wangmeng Zuo, Chunchao Guo

PDF

Open Access 1 Models

TL;DR

Voyager introduces a novel video diffusion framework that generates long-range, world-consistent 3D scenes from a single image, enabling explorable virtual environments with high visual and geometric fidelity.

Contribution

The paper presents Voyager, a comprehensive end-to-end system for 3D scene generation that eliminates traditional reconstruction pipelines and ensures global consistency across frames.

Findings

01

Improves visual quality over existing methods

02

Achieves high geometric accuracy in scene reconstruction

03

Enables large-scale, diverse 3D scene generation

Abstract

Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1) World-Consistent Video Diffusion: A unified architecture that jointly generates aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tencent/HunyuanWorld-Voyager
model· 114 dl· ♡ 361
114 dl♡ 361

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsDiffusion