Modeling Context in Referring Expressions

Licheng Yu; Patrick Poirson; Shan Yang; Alexander C. Berg; Tamara L.; Berg

arXiv:1608.00272·cs.CV·August 11, 2016·54 cites

Modeling Context in Referring Expressions

Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L., Berg

PDF

Open Access 4 Repos 10 Models 1 Datasets

TL;DR

This paper improves the generation and understanding of referring expressions in images by incorporating visual context and joint object modeling, leading to significant performance gains on multiple datasets.

Contribution

It introduces methods that incorporate visual comparison and joint generation for objects, enhancing referring expression models.

Findings

01

Visual comparison improves model performance

02

Joint generation for objects enhances expression quality

03

Methods outperform previous approaches on three datasets

Abstract

Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performance significantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg, shows the advantages of our methods for both referring expression generation and comprehension.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

rhymes-ai/RefCOCO
dataset· 27 dl
27 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques