3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
Changli Wu, Yiwei Ma, Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji,, Xiaoshuai Sun

TL;DR
The paper introduces 3D-STMN, an end-to-end model for 3D referring expression segmentation that significantly improves accuracy and inference speed by directly matching superpoints with text and utilizing dependency-driven semantic understanding.
Contribution
The novel 3D-STMN model combines superpoint-text matching with dependency-driven interaction, enabling efficient and accurate 3D segmentation guided by natural language expressions.
Findings
Achieves 11.7 points higher mIoU on ScanRefer benchmark
Surpasses traditional methods in inference speed by 95.7 times
Sets new performance standards in 3D referring expression segmentation
Abstract
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions. However, this conventional paradigm encounters significant challenges, most notably in terms of the generation of lackluster initial proposals and a pronounced deceleration in inference speed. Recognizing these limitations, we introduce an innovative end-to-end Superpoint-Text Matching Network (3D-STMN) that is enriched by dependency-driven insights. One of the keystones of our model is the Superpoint-Text Matching (STM) mechanism. Unlike traditional methods that navigate through instance proposals, STM directly correlates linguistic indications with their respective superpoints, clusters of semantically related points. This architectural decision empowers our model to efficiently harness cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
