Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views
Haida Feng, Hao Wei, Zewen Xu, Haolin Wang, Chade Li, Yihong Wu

TL;DR
Sparse3DPR is a training-free 3D scene understanding framework that uses hierarchical scene graphs and subgraph extraction to improve reasoning accuracy and efficiency from sparse RGB views, leveraging pre-trained LLMs.
Contribution
It introduces a hierarchical plane-enhanced scene graph and task-adaptive subgraph extraction, enabling open-vocabulary reasoning and dynamic noise filtering without training.
Findings
28.7% EM@1 improvement over ConceptGraphs
78.2% speedup compared to ConceptGraphs
Achieves comparable performance to training-based methods on ScanQA
Abstract
Recently, large language models (LLMs) have been explored widely for 3D scene understanding. Among them, training-free approaches are gaining attention for their flexibility and generalization over training-based methods. However, they typically struggle with accuracy and efficiency in practical deployment. To address the problems, we propose Sparse3DPR, a novel training-free framework for open-ended scene understanding, which leverages the reasoning capabilities of pre-trained LLMs and requires only sparse-view RGB inputs. Specifically, we introduce a hierarchical plane-enhanced scene graph that supports open vocabulary and adopts dominant planar structures as spatial anchors, which enables clearer reasoning chains and more reliable high-level inferences. Furthermore, we design a task-adaptive subgraph extraction method to filter query-irrelevant information dynamically, reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Advanced Neural Network Applications
