Compositional 3D Scene Generation using Locally Conditioned Diffusion

Ryan Po; Gordon Wetzstein

arXiv:2303.12218·cs.CV·March 24, 2023·1 cites

Compositional 3D Scene Generation using Locally Conditioned Diffusion

Ryan Po, Gordon Wetzstein

PDF

Open Access

TL;DR

This paper presents a novel locally conditioned diffusion method for compositional 3D scene generation, enabling detailed, controllable, and seamless scene synthesis from text prompts and bounding boxes, surpassing existing object-level methods.

Contribution

The paper introduces a new locally conditioned diffusion approach for scene-level 3D generation, allowing control over semantic parts and improving fidelity over prior object-focused models.

Findings

01

Higher fidelity scene generation compared to baselines

02

Effective control over semantic parts with text and bounding boxes

03

Seamless transitions between scene components

Abstract

Designing complex 3D scenes has been a tedious, manual process requiring domain expertise. Emerging text-to-3D generative models show great promise for making this task more intuitive, but existing approaches are limited to object-level generation. We introduce \textbf{locally conditioned diffusion} as an approach to compositional scene diffusion, providing control over semantic parts using text prompts and bounding boxes while ensuring seamless transitions between these parts. We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Human Motion and Animation · Computer Graphics and Visualization Techniques