Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning

Fei Yu; Quan Deng; Shengeng Tang; Yuehua Li; Lechao Cheng

arXiv:2511.05894·cs.CV·November 11, 2025

Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning

Fei Yu, Quan Deng, Shengeng Tang, Yuehua Li, Lechao Cheng

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework for open-world 3D scene graph generation that combines vision-language models and retrieval-based reasoning to enable flexible, interactive, and generalizable scene understanding beyond fixed vocabularies.

Contribution

It presents a unified approach integrating dynamic scene graph generation with retrieval-augmented reasoning, supporting open-vocabulary and multimodal exploration in 3D scene understanding.

Findings

01

Achieves robust generalization across diverse 3D environments.

02

Outperforms existing methods on 3DSSG and Replica benchmarks.

03

Supports multiple tasks like question answering and task planning.

Abstract

Understanding 3D scenes in open-world settings poses fundamental challenges for vision and robotics, particularly due to the limitations of closed-vocabulary supervision and static annotations. To address this, we propose a unified framework for Open-World 3D Scene Graph Generation with Retrieval-Augmented Reasoning, which enables generalizable and interactive 3D scene understanding. Our method integrates Vision-Language Models (VLMs) with retrieval-based reasoning to support multimodal exploration and language-guided interaction. The framework comprises two key components: (1) a dynamic scene graph generation module that detects objects and infers semantic relationships without fixed label sets, and (2) a retrieval-augmented reasoning pipeline that encodes scene graphs into a vector database to support text/image-conditioned queries. We evaluate our method on 3DSSG and Replica…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Robotics and Sensor-Based Localization