Learning Dual Semantic Relations with Graph Attention for Image-Text   Matching

Keyu Wen; Xiaodong Gu; Qingrong Cheng

arXiv:2010.11550·cs.CV·October 23, 2020

Learning Dual Semantic Relations with Graph Attention for Image-Text Matching

Keyu Wen, Xiaodong Gu, Qingrong Cheng

PDF

1 Repo

TL;DR

This paper introduces DSRAN, a novel graph attention-based network that enhances multi-level semantic relations between regions and global concepts to improve image-text matching accuracy.

Contribution

It proposes a dual semantic relations attention network with separate and joint modules for multi-level relation learning, advancing cross-modal representation alignment.

Findings

01

Outperforms previous methods on MS-COCO and Flickr30K datasets.

02

Effectively learns hierarchical semantic relations for better image-text matching.

03

Demonstrates significant improvement in matching accuracy.

Abstract

Image-Text Matching is one major task in cross-modal information processing. The main challenge is to learn the unified visual and textual representations. Previous methods that perform well on this task primarily focus on not only the alignment between region features in images and the corresponding words in sentences, but also the alignment between relations of regions and relational words. However, the lack of joint learning of regional features and global features will cause the regional features to lose contact with the global context, leading to the mismatch with those non-object words which have global meanings in some sentences. In this work, in order to alleviate this issue, it is necessary to enhance the relations between regions and the relations between regional and global concepts to obtain a more accurate visual representation so as to be better correlated to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kywen1119/DSRAN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.