MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models
Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

TL;DR
MVRoom is a novel multi-view diffusion-based pipeline for controllable 3D indoor scene synthesis, leveraging a two-stage process with layout-aware mechanisms to ensure multi-view consistency and support text-to-scene generation.
Contribution
The paper introduces MVRoom, a new two-stage diffusion framework that enforces multi-view consistency using layout-aware attention, enabling controllable and high-fidelity 3D indoor scene generation.
Findings
Outperforms state-of-the-art methods quantitatively and qualitatively.
Supports recursive generation of scenes with varying complexity.
Effectiveness of key components validated through ablation studies.
Abstract
We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
