Generation and Comprehension of Unambiguous Object Descriptions

Junhua Mao; Jonathan Huang; Alexander Toshev; Oana Camburu; Alan; Yuille; Kevin Murphy

arXiv:1511.02283·cs.CV·April 12, 2016·128 cites

Generation and Comprehension of Unambiguous Object Descriptions

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan, Yuille, Kevin Murphy

PDF

Open Access 1 Repo 10 Models 1 Video

TL;DR

This paper introduces a deep learning-based method for generating and understanding unambiguous object descriptions in images, outperforming previous approaches and providing a new large-scale dataset for the task.

Contribution

The paper presents a novel deep learning model for unambiguous referring expression generation and comprehension, along with a new large-scale dataset based on MS-COCO.

Findings

01

Our method outperforms previous approaches in generating unambiguous descriptions.

02

The dataset enables objective evaluation of referring expression tasks.

03

The toolbox facilitates visualization and assessment of model performance.

Abstract

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described. We show that our method outperforms previous methods that generate descriptions of objects without taking into account other potentially ambiguous objects in the scene. Our model is inspired by recent successes of deep learning methods for image captioning, but while image captioning is difficult to evaluate, our task allows for easy objective evaluation. We also present a new large-scale dataset for referring expressions, based on MS-COCO. We have released the dataset and a toolbox for visualization and evaluation, see https://github.com/mjhucla/Google_Refexp_toolbox

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mjhucla/Google_Refexp_toolbox
noneOfficial

Models

Videos

Generation and Comprehension of Unambiguous Object Descriptions· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques