3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive   Selection

Junyu Luo; Jiahui Fu; Xianghao Kong; Chen Gao; Haibing Ren; Hao Shen,; Huaxia Xia; Si Liu

arXiv:2204.06272·cs.CV·October 13, 2023·5 cites

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen,, Huaxia Xia, Si Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel single-stage 3D visual grounding method that directly locates objects in point clouds using language guidance, improving accuracy over traditional two-stage approaches.

Contribution

The paper proposes 3D-SPS, a single-stage framework with new modules for language-guided keypoint sampling and progressive target mining, bridging detection and matching in 3D grounding.

Findings

01

Achieves state-of-the-art results on ScanRefer and Nr3D/Sr3D datasets.

02

Effectively focuses on language-relevant and target points through proposed modules.

03

Outperforms previous two-stage methods in accuracy and efficiency.

Abstract

3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description. Previous methods mostly follow a two-stage paradigm, i.e., language-irrelevant detection and cross-modal matching, which is limited by the isolated architecture. In such a paradigm, the detector needs to sample keypoints from raw point clouds due to the inherent properties of 3D point clouds (irregular and large-scale), to generate the corresponding object proposal for each keypoint. However, sparse proposals may leave out the target in detection, while dense proposals may confuse the matching model. Moreover, the language-irrelevant detection stage can only sample a small proportion of keypoints on the target, deteriorating the target prediction. In this paper, we propose a 3D Single-Stage Referred Point Progressive Selection (3D-SPS) method, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fjhzhixi/3d-sps
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Human Pose and Action Recognition