Deep Reversible Consistency Learning for Cross-modal Retrieval

Ruitao Pu; Yang Qin; Dezhong Peng; Xiaomin Song; Huiming Zheng

arXiv:2501.05686·cs.CV·January 13, 2025

Deep Reversible Consistency Learning for Cross-modal Retrieval

Ruitao Pu, Yang Qin, Dezhong Peng, Xiaomin Song, Huiming Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces Deep Reversible Consistency Learning (DRCL), a novel approach for cross-modal retrieval that enhances semantic alignment and flexibility by recasting modality-invariant representations guided by labels.

Contribution

The proposed DRCL method innovatively combines selective prior learning and reversible semantic consistency to improve cross-modal retrieval performance.

Findings

01

DRCL outperforms 15 state-of-the-art methods on five datasets.

02

The feature augmentation mechanism increases diversity and robustness.

03

Extensive experiments validate the effectiveness of DRCL.

Abstract

Cross-modal retrieval (CMR) typically involves learning common representations to directly measure similarities between multimodal samples. Most existing CMR methods commonly assume multimodal samples in pairs and employ joint training to learn common representations, limiting the flexibility of CMR. Although some methods adopt independent training strategies for each modality to improve flexibility in CMR, they utilize the randomly initialized orthogonal matrices to guide representation learning, which is suboptimal since they assume inter-class samples are independent of each other, limiting the potential of semantic alignments between sample representations and ground-truth labels. To address these issues, we propose a novel method termed Deep Reversible Consistency Learning (DRCL) for cross-modal retrieval. DRCL includes two core modules, \ie Selective Prior Learning (SPL) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

perquisite/drcl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsADaptive gradient method with the OPTimal convergence rate · Semi-Pseudo-Label