IMRAM: Iterative Matching with Recurrent Attention Memory for   Cross-Modal Image-Text Retrieval

Hui Chen; Guiguang Ding; Xudong Liu; Zijia Lin; Ji Liu; Jungong Han

arXiv:2003.03772·cs.CV·March 10, 2020·29 cites

IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval

Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, Jungong Han

PDF

Open Access 1 Repo 1 Video

TL;DR

IMRAM introduces an iterative, memory-augmented approach to improve fine-grained cross-modal image-text retrieval by progressively refining alignments, achieving state-of-the-art results on multiple benchmarks.

Contribution

The paper proposes IMRAM, a novel iterative matching framework with recurrent attention memory that better captures complex semantic correspondences between images and texts.

Findings

01

Achieves state-of-the-art performance on Flickr8K, Flickr30K, and MS COCO datasets.

02

Effectively refines alignments through multiple iterative steps.

03

Demonstrates practical applicability on a business advertisement dataset.

Abstract

Enabling bi-directional retrieval of images and texts is important for understanding the correspondence between vision and language. Existing methods leverage the attention mechanism to explore such correspondence in a fine-grained manner. However, most of them consider all semantics equally and thus align them uniformly, regardless of their diverse complexities. In fact, semantics are diverse (i.e. involving different kinds of semantic concepts), and humans usually follow a latent structure to combine them into understandable languages. It may be difficult to optimally capture such sophisticated correspondences in existing methods. In this paper, to address such a deficiency, we propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences between images and texts are captured with multiple steps of alignments. Specifically, we introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HuiChen24/IMRAM
pytorchOfficial

Videos

IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning