SemLayoutDiff: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis

Xiaohao Sun; Divyam Goel; Angel X. Chang

arXiv:2508.18597·cs.GR·September 9, 2025

SemLayoutDiff: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis

Xiaohao Sun, Divyam Goel, Angel X. Chang

PDF

TL;DR

SemLayoutDiff is a diffusion-based model that synthesizes diverse, realistic 3D indoor scenes by explicitly conditioning on architectural constraints and generating coherent furniture layouts.

Contribution

It introduces a novel scene layout representation and a diffusion model capable of conditioning on room masks, improving scene realism and diversity.

Findings

01

Outperforms previous methods on 3D-FRONT dataset

02

Produces spatially coherent and realistic scenes

03

Accounts for architectural elements like doors and windows

Abstract

We present SemLayoutDiff, a unified model for synthesizing diverse 3D indoor scenes across multiple room types. The model introduces a scene layout representation combining a top-down semantic map and attributes for each object. Unlike prior approaches, which cannot condition on architectural constraints, SemLayoutDiff employs a categorical diffusion model capable of conditioning scene synthesis explicitly on room masks. It first generates a coherent semantic map, followed by a cross-attention-based network to predict furniture placements that respect the synthesized layout. Our method also accounts for architectural elements such as doors and windows, ensuring that generated furniture arrangements remain practical and unobstructed. Experiments on the 3D-FRONT dataset show that SemLayoutDiff produces spatially coherent, realistic, and varied scenes, outperforming previous methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.