Give Me Something to Eat: Referring Expression Comprehension with   Commonsense Knowledge

Peng Wang; Dongyang Liu; Hui Li; Qi Wu

arXiv:2006.01629·cs.CV·August 18, 2020·1 cites

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge

Peng Wang, Dongyang Liu, Hui Li, Qi Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces KB-Ref, a new dataset for referring expression comprehension that requires commonsense knowledge, and proposes ECIFA, a model that improves understanding by integrating image regions and knowledge facts.

Contribution

The paper creates KB-Ref, a dataset emphasizing commonsense reasoning in REF, and develops ECIFA, a model that leverages both visual and knowledge-based information.

Findings

01

State-of-the-art REF models perform poorly on KB-Ref.

02

ECIFA significantly improves performance over existing models.

03

A gap remains between model and human understanding.

Abstract

Conventional referring expression comprehension (REF) assumes people to query something from an image by describing its visual appearance and spatial location, but in practice, we often ask for an object by describing its affordance or other non-visual attributes, especially when we do not have a precise target. For example, sometimes we say 'Give me something to eat'. In this case, we need to use commonsense knowledge to identify the objects in the image. Unfortunately, these is no existing referring expression dataset reflecting this requirement, not to mention a model to tackle this challenge. In this paper, we collect a new referring expression dataset, called KB-Ref, containing 43k expressions on 16k images. In KB-Ref, to answer each expression (detect the target object referred by the expression), at least one piece of commonsense knowledge must be required. We then test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhanyang-nwpu/rsvg-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques