SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
Hongpei Zheng, Shijie Li, Yanran Li, Hujun Yin

TL;DR
This paper introduces SpatialReasoner, an active perception framework for large-scale 3D scene understanding, and a new dataset H$^2$U3D, enabling efficient exploration and question answering in multi-floor house environments.
Contribution
It presents a novel active perception approach that uses spatial tools for autonomous scene exploration and introduces a comprehensive 3D dataset for house-scale scene understanding.
Findings
SpatialReasoner achieves state-of-the-art results on H$^2$U3D.
It requires significantly fewer images than baseline methods.
The coarse-to-fine exploration strategy improves efficiency and accuracy.
Abstract
Spatial reasoning in large-scale 3D environments remains challenging for current vision-language models, which are typically constrained to room-scale scenarios. We introduce HU3D (Holistic House Understanding in 3D), a 3D visual question answering dataset designed for house-scale scene understanding. HU3D features multi-floor environments spanning up to three floors and 10-20 rooms, covering more than 300 m. Through an automated annotation pipeline, it constructs hierarchical coarse-to-fine visual representations and generates diverse question-answer pairs with chain-of-thought annotations. We further propose SpatialReasoner, an active perception framework that autonomously invokes spatial tools to explore 3D scenes based on textual queries. SpatialReasoner is trained through a two-stage strategy: a supervised cold start followed by reinforcement learning with an adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling
