Cops-Ref: A new Dataset and Task on Compositional Referring Expression   Comprehension

Zhenfang Chen; Peng Wang; Lin Ma; Kwan-Yee K. Wong; Qi Wu

arXiv:2003.00403·cs.CV·March 3, 2020·6 cites

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, Qi Wu

PDF

Open Access 1 Video

TL;DR

This paper introduces Cops-Ref, a challenging new dataset and task for referring expression comprehension that emphasizes complex reasoning and distractor handling, revealing limitations of current models and encouraging deeper visual reasoning research.

Contribution

It presents a novel dataset with compositional expressions and a challenging test setting, advancing the evaluation of reasoning capabilities in referring expression comprehension models.

Findings

01

Existing models perform poorly on the new dataset.

02

A modular hard mining strategy improves model performance.

03

The dataset reveals significant room for improvement in visual reasoning.

Abstract

Referring expression comprehension (REF) aims at identifying a particular object in a scene by a natural language expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features. First, we design a novel expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate expressions with varying compositionality. Second, to better exploit the full reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques