3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen,, Huaxia Xia, Si Liu

TL;DR
This paper introduces a novel single-stage 3D visual grounding method that directly locates objects in point clouds using language guidance, improving accuracy over traditional two-stage approaches.
Contribution
The paper proposes 3D-SPS, a single-stage framework with new modules for language-guided keypoint sampling and progressive target mining, bridging detection and matching in 3D grounding.
Findings
Achieves state-of-the-art results on ScanRefer and Nr3D/Sr3D datasets.
Effectively focuses on language-relevant and target points through proposed modules.
Outperforms previous two-stage methods in accuracy and efficiency.
Abstract
3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description. Previous methods mostly follow a two-stage paradigm, i.e., language-irrelevant detection and cross-modal matching, which is limited by the isolated architecture. In such a paradigm, the detector needs to sample keypoints from raw point clouds due to the inherent properties of 3D point clouds (irregular and large-scale), to generate the corresponding object proposal for each keypoint. However, sparse proposals may leave out the target in detection, while dense proposals may confuse the matching model. Moreover, the language-irrelevant detection stage can only sample a small proportion of keypoints on the target, deteriorating the target prediction. In this paper, we propose a 3D Single-Stage Referred Point Progressive Selection (3D-SPS) method, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Human Pose and Action Recognition
