ReSpace: Text-Driven Autoregressive 3D Indoor Scene Synthesis and Editing
Martin JJ. Bucher, Iro Armeni

TL;DR
ReSpace is a novel autoregressive framework that uses natural language to synthesize and edit 3D indoor scenes with explicit room boundaries, supporting object manipulation and achieving state-of-the-art results.
Contribution
It introduces a compact scene representation with explicit room boundaries and a language model fine-tuned for object addition, enabling natural language scene editing and synthesis.
Findings
Surpasses state-of-the-art in object addition
Achieves superior human-perceived quality in scene synthesis
Introduces a voxelization-based evaluation metric
Abstract
Scene synthesis and editing has emerged as a promising direction in computer graphics. Current trained approaches for 3D indoor scene generation either oversimplify object semantics through one-hot class encodings (e.g., 'chair' or 'table'), require masked diffusion for editing, ignore room boundaries, or rely on floor plan renderings that fail to capture complex layouts. LLM-based methods enable richer semantics via natural language, but lack editing functionality, are limited to rectangular layouts, or rely on weak spatial reasoning from implicit world models. We introduce ReSpace, a generative framework for autoregressive text-driven 3D indoor scene synthesis and editing. Our approach features a compact structured scene representation with explicit room boundaries that enables asset-agnostic deployment and frames scene manipulation as a next-token prediction task, supporting object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
