CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout
Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong, Lin, Lin Wang

TL;DR
CompoNeRF introduces a novel framework that combines editable 3D scene layouts with guidance mechanisms to improve multi-object scene generation from text, achieving higher fidelity and consistency.
Contribution
The paper presents CompoNeRF, a new approach integrating scene layout and dual-level guidance to enhance multi-object 3D scene synthesis from text prompts.
Findings
Achieves up to 54% improvement in multi-view CLIP score.
Significantly improves semantic accuracy and multi-view consistency.
Enables flexible scene editing and recomposition.
Abstract
Text-to-3D form plays a crucial role in creating editable 3D scenes for AR/VR. Recent advances have shown promise in merging neural radiance fields (NeRFs) with pre-trained diffusion models for text-to-3D object generation. However, one enduring challenge is their inadequate capability to accurately parse and regenerate consistent multi-object environments. Specifically, these models encounter difficulties in accurately representing quantity and style prompted by multi-object texts, often resulting in a collapse of the rendering fidelity that fails to match the semantic intricacies. Moreover, amalgamating these elements into a coherent 3D scene is a substantial challenge, stemming from generic distribution inherent in diffusion models. To tackle the issue of 'guidance collapse' and further enhance scene consistency, we propose a novel framework, dubbed CompoNeRF, by integrating an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Generative Adversarial Networks and Image Synthesis
MethodsContrastive Language-Image Pre-training · Diffusion
