Learning Relation Alignment for Calibrated Cross-modal Retrieval

Shuhuai Ren; Junyang Lin; Guangxiang Zhao; Rui Men; An Yang; Jingren; Zhou; Xu Sun; Hongxia Yang

arXiv:2105.13868·cs.CL·June 8, 2021·1 cites

Learning Relation Alignment for Calibrated Cross-modal Retrieval

Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren, Zhou, Xu Sun, Hongxia Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a relation alignment method for cross-modal retrieval that improves semantic consistency between image and text by calibrating intra-modal relations, leading to better retrieval performance.

Contribution

It proposes a novel metric, ISD, and a regularized training method, IAIS, to align linguistic and visual relations, enhancing cross-modal retrieval models.

Findings

01

Significant performance improvements on Flickr30k and MS COCO datasets.

02

The ISD metric effectively measures relation consistency.

03

The IAIS method enhances model interpretability and accuracy.

Abstract

Despite the achievements of large-scale multimodal pre-training approaches, cross-modal retrieval, e.g., image-text retrieval, remains a challenging task. To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions. The neglect of such relation consistency impairs the contextualized representation of image-text pairs and hinders the model performance and the interpretability. In this paper, we first propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations. In response, we present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lancopku/IAIS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning