Contrastive masked auto-encoders based self-supervised hashing for 2D   image and 3D point cloud cross-modal retrieval

Rukai Wei; Heng Cui; Yu Liu; Yufeng Hou; Yanzhao Xie; Ke Zhou

arXiv:2408.05711·cs.CV·August 13, 2024

Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

Rukai Wei, Heng Cui, Yu Liu, Yufeng Hou, Yanzhao Xie, Ke Zhou

PDF

Open Access

TL;DR

This paper introduces CMAH, a self-supervised hashing method using contrastive masked autoencoders to improve cross-modal retrieval between 2D images and 3D point clouds by effectively bridging the modality gap.

Contribution

The paper proposes a novel contrastive masked autoencoder framework for cross-modal hashing that captures multi-modal semantics without labels and enhances modality bridging.

Findings

01

CMAH outperforms baseline methods on three datasets.

02

Effective reduction of modality gap through contrastive learning.

03

Improved discriminability of hash codes.

Abstract

Implementing cross-modal hashing between 2D images and 3D point-cloud data is a growing concern in real-world retrieval systems. Simply applying existing cross-modal approaches to this new task fails to adequately capture latent multi-modal semantics and effectively bridge the modality gap between 2D and 3D. To address these issues without relying on hand-crafted labels, we propose contrastive masked autoencoders based self-supervised hashing (CMAH) for retrieval between images and point-cloud data. We start by contrasting 2D-3D pairs and explicitly constraining them into a joint Hamming space. This contrastive learning process ensures robust discriminability for the generated hash codes and effectively reduces the modality gap. Moreover, we utilize multi-modal auto-encoders to enhance the model's understanding of multi-modal semantics. By completing the masked image/point-cloud data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Image Retrieval and Classification Techniques

MethodsContrastive Learning