FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic   Gaussian Splatting

Yikun Ma; Dandan Zhan; Zhi Jin

arXiv:2405.05768·cs.CV·May 10, 2024

FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting

Yikun Ma, Dandan Zhan, Zhi Jin

PDF

Open Access

TL;DR

FastScene enables rapid, high-quality 3D indoor scene generation from text prompts by leveraging panoramic depth estimation and Gaussian Splatting, significantly reducing generation time while maintaining scene consistency.

Contribution

The paper introduces a novel fast framework for 3D scene generation from text, combining panoramic depth, view synthesis, inpainting, and Gaussian Splatting for improved speed and quality.

Findings

01

FastScene generates scenes in 15 minutes, at least one hour faster than previous methods.

02

It achieves higher scene quality and consistency compared to existing approaches.

03

Experimental results validate its effectiveness and efficiency.

Abstract

Text-driven 3D indoor scene generation holds broad applications, ranging from gaming and smart homes to AR/VR applications. Fast and high-fidelity scene generation is paramount for ensuring user-friendly experiences. However, existing methods are characterized by lengthy generation processes or necessitate the intricate manual specification of motion parameters, which introduces inconvenience for users. Furthermore, these methods often rely on narrow-field viewpoint iterative generations, compromising global consistency and overall scene quality. To address these issues, we propose FastScene, a framework for fast and higher-quality 3D scene generation, while maintaining the scene consistency. Specifically, given a text prompt, we generate a panorama and estimate its depth, since the panorama encompasses information about the entire scene and exhibits explicit geometric constraints. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Motion and Animation · Video Analysis and Summarization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Inpainting