GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

Antonio Ruiz; Tao Wu; Andrew Melnik; Qing Cheng; Xuqin Wang; Lu Liu; Yongliang Wang; Yanfeng Zhang; Helge Ritter

arXiv:2511.14884·cs.CV·November 20, 2025

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

Antonio Ruiz, Tao Wu, Andrew Melnik, Qing Cheng, Xuqin Wang, Lu Liu, Yongliang Wang, Yanfeng Zhang, Helge Ritter

PDF

Open Access

TL;DR

GeoSceneGraph is a novel method for text-guided 3D indoor scene synthesis that leverages scene graph structures and geometric symmetries, achieving high performance without relying on predefined relationship annotations.

Contribution

It introduces a new approach using equivariant graph neural networks conditioned on text, enabling realistic scene synthesis without ground-truth relationships.

Findings

01

Achieves comparable performance to relationship-based methods

02

Effectively leverages scene graph structures and geometric symmetries

03

Demonstrates strong results in 3D scene synthesis from text prompts

Abstract

Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents. Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs). While VLMs achieve strong performance, particularly for complex or open-ended prompts, smaller task-specific models remain necessary for deployment on resource-constrained devices such as extended reality (XR) glasses or mobile phones. However, many generative approaches that train from scratch overlook the inherent graph structure of indoor scenes, which can limit scene coherence and realism. Conversely, methods that incorporate scene graphs either demand a user-provided semantic graph, which is generally inconvenient and restrictive, or rely on ground-truth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications