IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression   Segmentation

Qi Chen; Changli Wu; Jiayi Ji; Yiwei Ma; Danni Yang; Xiaoshuai Sun

arXiv:2501.04995·cs.CV·January 10, 2025

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation

Qi Chen, Changli Wu, Jiayi Ji, Yiwei Ma, Danni Yang, Xiaoshuai Sun

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces IPDN, a novel network that enhances 3D referring expression segmentation by integrating multi-view images and task-driven prompts, effectively addressing feature and intent ambiguities.

Contribution

The paper proposes the Multi-view Semantic Embedding module and Prompt-Aware Decoder to improve reasoning in 3D-RES, outperforming existing methods.

Findings

01

IPDN outperforms state-of-the-art by 1.9 and 4.2 points in mIoU.

02

Multi-view semantic embedding improves spatial information retention.

03

Prompt-aware decoding enhances task-specific guidance.

Abstract

3D Referring Expression Segmentation (3D-RES) aims to segment point cloud scenes based on a given expression. However, existing 3D-RES approaches face two major challenges: feature ambiguity and intent ambiguity. Feature ambiguity arises from information loss or distortion during point cloud acquisition due to limitations such as lighting and viewpoint. Intent ambiguity refers to the model's equal treatment of all queries during the decoding process, lacking top-down task-specific guidance. In this paper, we introduce an Image enhanced Prompt Decoding Network (IPDN), which leverages multi-view images and task-driven information to enhance the model's reasoning capabilities. To address feature ambiguity, we propose the Multi-view Semantic Embedding (MSE) module, which injects multi-view 2D image information into the 3D scene and compensates for potential spatial information loss. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

80chen86/ipdn
pytorchOfficial

Videos

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation· underline

Taxonomy

TopicsBrain Tumor Detection and Classification · Medical Imaging Techniques and Applications · Advanced Neural Network Applications