Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang, Lin

TL;DR
This paper introduces a new dataset for knowledge-based visual question reasoning that aims to evaluate and improve deep models' ability to perform multi-step reasoning and avoid superficial correlations, addressing biases in existing datasets.
Contribution
The paper presents a novel dataset with controlled question-answer pairs generated from scene graphs and external knowledge, designed to challenge models' reasoning and perception capabilities.
Findings
Dataset effectively reduces shortcut learning in VQA models.
Models trained on the dataset demonstrate improved reasoning abilities.
The dataset highlights the importance of multi-step knowledge reasoning in VQA.
Abstract
Though beneficial for encouraging the Visual Question Answering (VQA) models to discover the underlying knowledge by exploiting the input-output correlation beyond image and text contexts, the existing knowledge VQA datasets are mostly annotated in a crowdsource way, e.g., collecting questions and external reasons from different users via the internet. In addition to the challenge of knowledge reasoning, how to deal with the annotator bias also remains unsolved, which often leads to superficial over-fitted correlations between questions and answers. To address this issue, we propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation. Considering that a desirable VQA model should correctly perceive the image context, understand the question, and incorporate its learned knowledge, our proposed dataset aims to cutoff the shortcut learning exploited by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
