SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation
Song Tang, Kaiyong Zhao, Yuliang Li, Qingsong Yan, Penglei Sun, Junyi Zou, Qiang Wang, Xiaowen Chu

TL;DR
SpatialGrammar introduces a domain-specific language and systems for generating accurate, collision-free 3D indoor scenes from natural language, improving spatial reasoning and physical plausibility.
Contribution
It presents SpatialGrammar, a new scene representation and two systems, SG-Agent and SG-Mini, for verifiable, high-quality 3D indoor scene generation from language.
Findings
SG-Agent improves spatial fidelity and physical plausibility.
SG-Mini performs competitively with larger models on single-shot tasks.
The approach reduces spatial errors and collisions in generated scenes.
Abstract
Automatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial errors and collisions, in part because common scene representations-raw coordinates or verbose code-are difficult for models to reason about 3D spatial relationships and physical constraints. We propose SpatialGrammar, a domain-specific language that represents gravity-aligned indoor layouts as BEV grid placements with deterministic compilation to valid 3D geometry, enabling verifiable constraint checking. Building on this representation, we develop (1) SG-Agent, a closed-loop system that uses compiler feedback to iteratively refine scenes and enforce collision constraints, and (2) SG-Mini, a 104M-parameter model trained entirely on compiler-validated synthetic data. Across 159 test scenes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
