OGScene3D: Incremental Open-Vocabulary 3D Gaussian Scene Graph Mapping for Scene Understanding

Siting Zhu; Ziyun Lu; Guangming Wang; Chenguang Huang; Yongbo Chen; I-Ming Chen; Wolfram Burgard; Hesheng Wang

arXiv:2603.16301·cs.RO·March 19, 2026

OGScene3D: Incremental Open-Vocabulary 3D Gaussian Scene Graph Mapping for Scene Understanding

Siting Zhu, Ziyun Lu, Guangming Wang, Chenguang Huang, Yongbo Chen, I-Ming Chen, Wolfram Burgard, Hesheng Wang

PDF

Open Access

TL;DR

OGScene3D introduces an incremental open-vocabulary 3D scene understanding system that constructs accurate, globally consistent scene graphs by leveraging Gaussian semantic representations, hierarchical optimization, and temporal memory, suitable for robotic exploration.

Contribution

The paper presents OGScene3D, a novel system that enables incremental, open-vocabulary 3D scene graph mapping with confidence-based semantic modeling and dynamic graph construction, advancing robotic scene understanding.

Findings

01

Effective semantic mapping and scene graph construction demonstrated on datasets and real-world scenes.

02

Improved semantic consistency through hierarchical optimization and temporal memory.

03

Robust open-vocabulary scene understanding in incremental exploration scenarios.

Abstract

Open-vocabulary scene understanding is crucial for robotic applications, enabling robots to comprehend complex 3D environmental contexts and supporting various downstream tasks such as navigation and manipulation. However, existing methods require pre-built complete 3D semantic maps to construct scene graphs for scene understanding, which limits their applicability in robotic scenarios where environments are explored incrementally. To address this challenge, we propose OGScene3D, an open-vocabulary scene understanding system that achieves accurate 3D semantic mapping and scene graph construction incrementally. Our system employs a confidence-based Gaussian semantic representation that jointly models semantic predictions and their reliability, enabling robust scene modeling. Building on this representation, we introduce a hierarchical 3D semantic optimization strategy that achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization