WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing Yu; Haoyi Duan; Charles Herrmann; William T. Freeman; Jiajun; Wu

arXiv:2406.09394·cs.CV·March 26, 2025·2 cites

WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun, Wu

PDF

Open Access

TL;DR

WonderWorld is a fast, interactive framework for generating 3D scenes from a single image, enabling real-time user interaction and exploration with high speed and coherence.

Contribution

The paper introduces FLAGS, a novel scene representation, and a guided depth diffusion method, significantly reducing 3D scene generation time from a single view.

Findings

01

Generates connected 3D scenes in less than 10 seconds.

02

Enables real-time interactive scene creation and exploration.

03

Achieves high coherence and diversity in generated scenes.

Abstract

We present WonderWorld, a novel framework for interactive 3D scene generation that enables users to interactively specify scene contents and layout and see the created scenes in low latency. The major challenge lies in achieving fast generation of 3D scenes. Existing scene generation approaches fall short of speed as they often require (1) progressively generating many views and depth maps, and (2) time-consuming optimization of the scene geometry representations. We introduce the Fast Layered Gaussian Surfels (FLAGS) as our scene representation and an algorithm to generate it from a single view. Our approach does not need multiple views, and it leverages a geometry-based initialization that significantly reduces optimization time. Another challenge is generating coherent geometry that allows all scenes to be connected. We introduce the guided depth diffusion that allows partial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings