Natural Language Object Retrieval

Ronghang Hu; Huazhe Xu; Marcus Rohrbach; Jiashi Feng; Kate Saenko,; Trevor Darrell

arXiv:1511.04164·cs.CV·April 12, 2016·26 cites

Natural Language Object Retrieval

Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko,, Trevor Darrell

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel model for natural language object retrieval that combines spatial, local, and global scene information to accurately localize objects based on language queries, outperforming previous methods.

Contribution

The paper proposes the Spatial Context Recurrent ConvNet (SCRC), integrating spatial configurations and global context into object retrieval, and demonstrates effective knowledge transfer from image captioning.

Findings

01

Outperforms previous baseline methods on multiple datasets

02

Effectively utilizes local and global scene information

03

Leverages large-scale vision and language datasets for transfer learning

Abstract

In this paper, we address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object. Natural language object retrieval differs from text-based image retrieval task as it involves spatial information about objects within the scene and global scene context. To address this issue, we propose a novel Spatial Context Recurrent ConvNet (SCRC) model as scoring function on candidate boxes for object retrieval, integrating spatial configurations and global scene-level contextual information into the network. Our model processes query text, local image descriptors, spatial configurations and global context features through a recurrent network, outputs the probability of the query text conditioned on each candidate box as a score for the box, and can transfer visual-linguistic knowledge from image captioning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ronghanghu/natural-language-object-retrieval
caffe2

Videos

Natural Language Object Retrieval· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning