LESS: Label-Efficient and Single-Stage Referring 3D Segmentation

Xuexun Liu; Xiaoxu Xu; Jinlong Li; Qiudan Zhang; Xu Wang; Nicu Sebe; and Lin Ma

arXiv:2410.13294·cs.CV·October 29, 2024

LESS: Label-Efficient and Single-Stage Referring 3D Segmentation

Xuexun Liu, Xiaoxu Xu, Jinlong Li, Qiudan Zhang, Xu Wang, Nicu Sebe, and Lin Ma

PDF

Open Access 1 Repo 1 Video

TL;DR

LESS introduces a single-stage, label-efficient approach for referring 3D segmentation that aligns points with textual queries using binary supervision, reducing labeling effort and improving performance.

Contribution

The paper proposes a novel single-stage pipeline with cross-modal alignment and regularization techniques, achieving state-of-the-art results with less supervision.

Findings

01

Achieves 3.7% higher mIoU on ScanRefer dataset.

02

Uses only binary masks for supervision, reducing labeling effort.

03

Outperforms previous two-stage methods in accuracy.

Abstract

Referring 3D Segmentation is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query. Previous works perform a two-stage paradigm, first conducting language-agnostic instance segmentation then matching with given text query. However, the semantic concepts from text query and visual cues are separately interacted during the training, and both instance and semantic labels for each object are required, which is time consuming and human-labor intensive. To mitigate these issues, we propose a novel Referring 3D Segmentation pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask. Specifically, we design a Point-Word Cross-Modal Alignment module for aligning the fine-grained features of points and textual embedding. Query Mask Predictor module and Query-Sentence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mellody11/less
pytorchOfficial

Videos

LESS: Label-Efficient and Single-Stage Referring 3D Segmentation· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Image Retrieval and Classification Techniques