MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Shaoheng Fang; Chaohui Yu; Fan Wang; Qixing Huang

arXiv:2512.04248·cs.CV·December 5, 2025

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

PDF

Open Access

TL;DR

MVRoom is a novel multi-view diffusion-based pipeline for controllable 3D indoor scene synthesis, leveraging a two-stage process with layout-aware mechanisms to ensure multi-view consistency and support text-to-scene generation.

Contribution

The paper introduces MVRoom, a new two-stage diffusion framework that enforces multi-view consistency using layout-aware attention, enabling controllable and high-fidelity 3D indoor scene generation.

Findings

01

Outperforms state-of-the-art methods quantitatively and qualitatively.

02

Supports recursive generation of scenes with varying complexity.

03

Effectiveness of key components validated through ablation studies.

Abstract

We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · 3D Shape Modeling and Analysis