Referring Expression Comprehension: A Survey of Methods and Datasets

Yanyuan Qiao; Chaorui Deng; Qi Wu

arXiv:2007.09554·cs.CV·December 8, 2020·5 cites

Referring Expression Comprehension: A Survey of Methods and Datasets

Yanyuan Qiao, Chaorui Deng, Qi Wu

PDF

Open Access 1 Models

TL;DR

This survey reviews recent methods and datasets for referring expression comprehension, highlighting the challenges, architectures, and future directions in localizing objects in images based on natural language descriptions.

Contribution

It provides a comprehensive classification of REC methods, compares state-of-the-art approaches, and discusses future research directions including compositional reasoning.

Findings

01

Joint embedding of images and expressions is common in REC methods.

02

Graph-based models effectively utilize structured representations.

03

Datasets vary in size and complexity, impacting model evaluation.

Abstract

Referring expression comprehension (REC) aims to localize a target object in an image described by a referring expression phrased in natural language. Different from the object detection task that queried object labels have been pre-defined, the REC problem only can observe the queries during the test. It thus more challenging than a conventional computer vision problem. This task has attracted a lot of attention from both computer vision and natural language processing community, and several lines of work have been proposed, from CNN-RNN model, modular network to complex graph-based model. In this survey, we first examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to encode the visual and textual modalities. In particular, we examine the common approach of joint embedding images and expressions to a common feature space.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling