CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Runtao Liu; Chenxi Liu; Yutong Bai; Alan Yuille

arXiv:1901.00850·cs.CV·April 9, 2019·21 cites

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Runtao Liu, Chenxi Liu, Yutong Bai, Alan Yuille

PDF

Open Access 3 Repos

TL;DR

This paper introduces CLEVR-Ref+, a synthetic dataset for diagnosing visual reasoning in referring expression tasks, and proposes IEP-Ref, a modular network that reveals reasoning steps and handles false premises effectively.

Contribution

The paper presents a new diagnostic dataset CLEVR-Ref+ and a modular network IEP-Ref that improves interpretability and robustness in referring expression comprehension.

Findings

01

IEP-Ref outperforms other models on CLEVR-Ref+

02

The module can reveal the reasoning process step-by-step

03

IEP-Ref correctly predicts no-foreground for false-premise expressions

Abstract

Referring object detection and referring image segmentation are important tasks that require joint understanding of visual information and natural language. Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process. To address these issues and complement similar efforts in visual question answering, we build CLEVR-Ref+, a synthetic diagnostic dataset for referring expression comprehension. The precise locations and attributes of the objects are readily available, and the referring expressions are automatically associated with functional programs. The synthetic nature allows control over dataset bias (through sampling strategy), and the modular programs enable intermediate reasoning ground truth without human annotators. In addition to evaluating several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition