Learning Deep Structure-Preserving Image-Text Embeddings

Liwei Wang; Yin Li; Svetlana Lazebnik

arXiv:1511.06078·cs.CV·April 15, 2016·55 cites

Learning Deep Structure-Preserving Image-Text Embeddings

Liwei Wang, Yin Li, Svetlana Lazebnik

PDF

Open Access

TL;DR

This paper introduces a deep neural network approach for learning joint image-text embeddings that preserve structure, significantly improving retrieval accuracy and achieving state-of-the-art results on multiple datasets.

Contribution

The paper presents a novel two-branch neural network with a combined ranking and neighborhood preservation loss for joint image-text embedding learning.

Findings

01

Achieves new state-of-the-art on Flickr30K and MSCOCO datasets.

02

Significant improvements in image-to-text and text-to-image retrieval accuracy.

03

Shows potential for phrase localization tasks.

Abstract

This paper proposes a method for learning joint embeddings of images and text using a two-branch neural network with multiple layers of linear projections followed by nonlinearities. The network is trained using a large margin objective that combines cross-view ranking constraints with within-view neighborhood structure preservation constraints inspired by metric learning literature. Extensive experiments show that our approach gains significant improvements in accuracy for image-to-text and text-to-image retrieval. Our method achieves new state-of-the-art results on the Flickr30K and MSCOCO image-sentence datasets and shows promise on the new task of phrase localization on the Flickr30K Entities dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques