Expressing Objects just like Words: Recurrent Visual Embedding for   Image-Text Matching

Tianlang Chen; Jiebo Luo

arXiv:2002.08510·cs.CV·February 21, 2020·1 cites

Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching

Tianlang Chen, Jiebo Luo

PDF

Open Access

TL;DR

This paper introduces a dual path recurrent neural network that reorders image objects based on related words and uses high-level features with attention mechanisms to improve image-text matching accuracy.

Contribution

The paper proposes a novel DP-RNN model that processes images and texts symmetrically, capturing semantic relations between objects for better matching performance.

Findings

01

Achieves state-of-the-art results on Flickr30K dataset.

02

Demonstrates competitive performance on MS-COCO dataset.

03

Validates the effectiveness of high-level object features and attention mechanisms.

Abstract

Existing image-text matching approaches typically infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image. However, they ignore the connections between the objects that are semantically related. These objects may collectively determine whether the image corresponds to a text or not. To address this problem, we propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN). In particular, given an input image-text pair, our model reorders the image objects based on the positions of their most related words in the text. In the same way as extracting the hidden features from word embeddings, the model leverages RNN to extract high-level object features from the reordered object inputs. We validate that the high-level object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning