Graph Pattern Loss based Diversified Attention Network for Cross-Modal   Retrieval

Xueying Chen; Rong Zhang; Yibing Zhan

arXiv:2106.13552·cs.CV·June 28, 2021

Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval

Xueying Chen, Rong Zhang, Yibing Zhan

PDF

Open Access

TL;DR

This paper introduces GPLDAN, an unsupervised cross-modal retrieval model that uses diversified attention and a novel graph pattern loss to analyze and leverage correlations among multi-modal representations, improving retrieval performance.

Contribution

The paper proposes a new Graph Pattern Loss and diversified attention mechanism for unsupervised cross-modal retrieval, enhancing correlation analysis among representations.

Findings

01

GPLDAN outperforms state-of-the-art methods on four datasets.

02

The graph pattern loss effectively captures correlations among different representations.

03

Diversified attention improves the discrimination of multi-modal features.

Abstract

Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio. One core of unsupervised approaches is to dig the correlations among different object representations to complete satisfied retrieval performance without requiring expensive labels. In this paper, we propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval to deeply analyze correlations among representations. First, we propose a diversified attention feature projector by considering the interaction between different representations to generate multiple representations of an instance. Then, we design a novel graph pattern loss to explore the correlations among different representations, in this graph all possible distances between different representations are considered. In addition, a modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques