FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding

Chenlu Zhan; Yufei Zhang; Gaoang Wang; Hongwei Wang

arXiv:2506.13629·cs.CV·July 29, 2025

FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding

Chenlu Zhan, Yufei Zhang, Gaoang Wang, Hongwei Wang

PDF

Open Access

TL;DR

FreeQ-Graph introduces a novel approach for free-form semantic querying in 3D scenes by constructing a complete scene graph and aligning it with accurate semantic labels without relying on predefined vocabularies, enabling advanced reasoning and querying.

Contribution

The paper presents a method to generate a complete 3D scene graph guided by LLM and LVLM, aligning semantic labels without training data, and enabling free-form querying with scene-level reasoning.

Findings

01

Outperforms existing methods in 3D semantic grounding and segmentation.

02

Excels in complex free-form semantic queries and relational reasoning.

03

Validated on 6 datasets demonstrating robustness and accuracy.

Abstract

Semantic querying in complex 3D scenes through free-form language presents a significant challenge. Existing 3D scene understanding methods use large-scale training data and CLIP to align text queries with 3D semantic features. However, their reliance on predefined vocabulary priors from training data hinders free-form semantic querying. Besides, recent advanced methods rely on LLMs for scene understanding but lack comprehensive 3D scene-level information and often overlook the potential inconsistencies in LLM-generated outputs. In our paper, we propose FreeQ-Graph, which enables Free-form Querying with a semantic consistent scene Graph for 3D scene understanding. The core idea is to encode free-form queries from a complete and accurate 3D scene graph without predefined vocabularies, and to align them with 3D consistent semantic labels, which accomplished through three key steps. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction

MethodsALIGN · Contrastive Language-Image Pre-training