Conditional Image-Text Embedding Networks

Bryan A. Plummer; Paige Kordas; M. Hadi Kiapour; Shuai Zheng; Robinson; Piramuthu; Svetlana Lazebnik

arXiv:1711.08389·cs.CV·July 31, 2018

Conditional Image-Text Embedding Networks

Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson, Piramuthu, Svetlana Lazebnik

PDF

1 Repo

TL;DR

This paper introduces a novel end-to-end model for grounding phrases in images by learning multiple text-conditioned embeddings with automatic concept assignment, improving performance across several datasets.

Contribution

It proposes a concept weight branch for automatic phrase-to-embedding assignment, simplifying representations and enhancing grounding accuracy.

Findings

01

Achieved 4%, 3%, and 4% improvements on three datasets.

02

Verified effectiveness through comprehensive experiments.

03

Outperformed strong baseline models.

Abstract

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BryanPlummer/cite
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.