COM3D: Leveraging Cross-View Correspondence and Cross-Modal Mining for   3D Retrieval

Hao Wu; Ruochong LI; Hao Wang; Hui Xiong

arXiv:2405.04103·cs.CV·May 8, 2024

COM3D: Leveraging Cross-View Correspondence and Cross-Modal Mining for 3D Retrieval

Hao Wu, Ruochong LI, Hao Wang, Hui Xiong

PDF

Open Access

TL;DR

COM3D introduces a novel cross-view correspondence and cross-modal mining approach to improve 3D shape and text retrieval, leveraging scene transformers and semi-hard negative mining for enhanced feature compatibility and state-of-the-art performance.

Contribution

It is the first to exploit cross-view correspondence and cross-modal mining for 3D-text retrieval, enriching 3D features with scene transformers and optimizing matching with semi-hard negative mining.

Findings

01

Achieves state-of-the-art results on Text2Shape dataset.

02

Demonstrates improved retrieval accuracy over previous methods.

03

Validates effectiveness of cross-view and cross-modal strategies.

Abstract

In this paper, we investigate an open research task of cross-modal retrieval between 3D shapes and textual descriptions. Previous approaches mainly rely on point cloud encoders for feature extraction, which may ignore key inherent features of 3D shapes, including depth, spatial hierarchy, geometric continuity, etc. To address this issue, we propose COM3D, making the first attempt to exploit the cross-view correspondence and cross-modal mining to enhance the retrieval performance. Notably, we augment the 3D features through a scene representation transformer, to generate cross-view correspondence features of 3D shapes, which enrich the inherent features and enhance their compatibility with text matching. Furthermore, we propose to optimize the cross-modal matching process based on the semi-hard negative example mining method, in an attempt to improve the learning efficiency. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Image Retrieval and Classification Techniques