CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D   Scene Layout

Haotian Bai; Yuanhuiyi Lyu; Lutao Jiang; Sijia Li; Haonan Lu; Xiaodong; Lin; Lin Wang

arXiv:2303.13843·cs.CV·September 25, 2024·5 cites

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout

Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong, Lin, Lin Wang

PDF

Open Access

TL;DR

CompoNeRF introduces a novel framework that combines editable 3D scene layouts with guidance mechanisms to improve multi-object scene generation from text, achieving higher fidelity and consistency.

Contribution

The paper presents CompoNeRF, a new approach integrating scene layout and dual-level guidance to enhance multi-object 3D scene synthesis from text prompts.

Findings

01

Achieves up to 54% improvement in multi-view CLIP score.

02

Significantly improves semantic accuracy and multi-view consistency.

03

Enables flexible scene editing and recomposition.

Abstract

Text-to-3D form plays a crucial role in creating editable 3D scenes for AR/VR. Recent advances have shown promise in merging neural radiance fields (NeRFs) with pre-trained diffusion models for text-to-3D object generation. However, one enduring challenge is their inadequate capability to accurately parse and regenerate consistent multi-object environments. Specifically, these models encounter difficulties in accurately representing quantity and style prompted by multi-object texts, often resulting in a collapse of the rendering fidelity that fails to match the semantic intricacies. Moreover, amalgamating these elements into a coherent 3D scene is a substantial challenge, stemming from generic distribution inherent in diffusion models. To tackle the issue of 'guidance collapse' and further enhance scene consistency, we propose a novel framework, dubbed CompoNeRF, by integrating an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Generative Adversarial Networks and Image Synthesis

MethodsContrastive Language-Image Pre-training · Diffusion