Knowledge-Routed Visual Question Reasoning: Challenges for Deep   Representation Embedding

Qingxing Cao; Bailin Li; Xiaodan Liang; Keze Wang; Liang; Lin

arXiv:2012.07192·cs.CV·December 15, 2020·5 cites

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang, Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new dataset for knowledge-based visual question reasoning that aims to evaluate and improve deep models' ability to perform multi-step reasoning and avoid superficial correlations, addressing biases in existing datasets.

Contribution

The paper presents a novel dataset with controlled question-answer pairs generated from scene graphs and external knowledge, designed to challenge models' reasoning and perception capabilities.

Findings

01

Dataset effectively reduces shortcut learning in VQA models.

02

Models trained on the dataset demonstrate improved reasoning abilities.

03

The dataset highlights the importance of multi-step knowledge reasoning in VQA.

Abstract

Though beneficial for encouraging the Visual Question Answering (VQA) models to discover the underlying knowledge by exploiting the input-output correlation beyond image and text contexts, the existing knowledge VQA datasets are mostly annotated in a crowdsource way, e.g., collecting questions and external reasons from different users via the internet. In addition to the challenge of knowledge reasoning, how to deal with the annotator bias also remains unsolved, which often leads to superficial over-fitted correlations between questions and answers. To address this issue, we propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation. Considering that a desirable VQA model should correctly perceive the image context, understand the question, and incorporate its learned knowledge, our proposed dataset aims to cutoff the shortcut learning exploited by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andersonstra/mukea
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques