Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

Xingjian Ran; Yixuan Li; Linning Xu; Mulin Yu; Bo Dai

arXiv:2506.05341·cs.CV·October 24, 2025

Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

Xingjian Ran, Yixuan Li, Linning Xu, Mulin Yu, Bo Dai

PDF

Open Access

TL;DR

This paper presents DirectLayout, a novel framework that generates 3D indoor scene layouts directly from text descriptions by leveraging large language models and spatial reasoning, improving flexibility and alignment with user instructions.

Contribution

It introduces a three-stage process for text-to-3D layout generation using LLMs, Chain-of-Thought reasoning, and iterative alignment, addressing dataset limitations and enhancing scene plausibility.

Findings

01

Achieves high semantic consistency in generated layouts

02

Demonstrates strong generalization across diverse scenes

03

Ensures physically plausible object placements

Abstract

Realistic 3D indoor scene synthesis is vital for embodied AI and digital content creation. It can be naturally divided into two subtasks: object generation and layout generation. While recent generative models have significantly advanced object-level quality and controllability, layout generation remains challenging due to limited datasets. Existing methods either overfit to these datasets or rely on predefined constraints to optimize numerical layout that sacrifice flexibility. As a result, they fail to generate scenes that are both open-vocabulary and aligned with fine-grained user instructions. We introduce DirectLayout, a framework that directly generates numerical 3D layouts from text descriptions using generalizable spatial reasoning of large language models (LLMs). DirectLayout decomposes the generation into three stages: producing a Bird's-Eye View (BEV) layout, lifting it into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation