SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

Song Tang; Kaiyong Zhao; Yuliang Li; Qingsong Yan; Penglei Sun; Junyi Zou; Qiang Wang; Xiaowen Chu

arXiv:2604.27555·cs.AI·May 1, 2026

SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

Song Tang, Kaiyong Zhao, Yuliang Li, Qingsong Yan, Penglei Sun, Junyi Zou, Qiang Wang, Xiaowen Chu

PDF

TL;DR

SpatialGrammar introduces a domain-specific language and systems for generating accurate, collision-free 3D indoor scenes from natural language, improving spatial reasoning and physical plausibility.

Contribution

It presents SpatialGrammar, a new scene representation and two systems, SG-Agent and SG-Mini, for verifiable, high-quality 3D indoor scene generation from language.

Findings

01

SG-Agent improves spatial fidelity and physical plausibility.

02

SG-Mini performs competitively with larger models on single-shot tasks.

03

The approach reduces spatial errors and collisions in generated scenes.

Abstract

Automatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial errors and collisions, in part because common scene representations-raw coordinates or verbose code-are difficult for models to reason about 3D spatial relationships and physical constraints. We propose SpatialGrammar, a domain-specific language that represents gravity-aligned indoor layouts as BEV grid placements with deterministic compilation to valid 3D geometry, enabling verifiable constraint checking. Building on this representation, we develop (1) SG-Agent, a closed-loop system that uses compiler feedback to iteratively refine scenes and enforce collision constraints, and (2) SG-Mini, a 104M-parameter model trained entirely on compiler-validated synthetic data. Across 159 test scenes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.