WorldExplorer: Towards Generating Fully Navigable 3D Scenes

Manuel-Andreas Schneider; Lukas H\"ollein; Matthias Nie{\ss}ner

arXiv:2506.01799·cs.CV·September 17, 2025

WorldExplorer: Towards Generating Fully Navigable 3D Scenes

Manuel-Andreas Schneider, Lukas H\"ollein, Matthias Nie{\ss}ner

PDF

TL;DR

WorldExplorer is a novel method that generates fully navigable 3D scenes from text by iteratively creating videos along trajectories, ensuring visual consistency and enabling realistic exploration.

Contribution

It introduces an autoregressive video trajectory generation approach with scene memory and collision detection for high-quality, explorable 3D scene synthesis from text prompts.

Findings

01

Produces stable, high-quality 3D scenes under large camera motions

02

Enables realistic unrestricted exploration of generated environments

03

Fuses multi-view videos into unified 3D representations

Abstract

Generating 3D worlds from text is a highly anticipated goal in computer vision. Existing works are limited by the degree of exploration they allow inside of a scene, i.e., produce streched-out and noisy artifacts when moving beyond central or panoramic perspectives. To this end, we propose WorldExplorer, a novel method based on autoregressive video trajectory generation, which builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints. We initialize our scenes by creating multi-view consistent images corresponding to a 360 degree panorama. Then, we expand it by leveraging video diffusion models in an iterative scene generation pipeline. Concretely, we generate multiple videos along short, pre-defined trajectories, that explore the scene in depth, including motion around objects. Our novel scene memory conditions each video on the most relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.