Zero-shot Visual Question Answering using Knowledge Graph

Zhuo Chen; Jiaoyan Chen; Yuxia Geng; Jeff Z. Pan; Zonggang Yuan and; Huajun Chen

arXiv:2107.05348·cs.AI·October 19, 2021·5 cites

Zero-shot Visual Question Answering using Knowledge Graph

Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan and, Huajun Chen

PDF

Open Access 2 Repos

TL;DR

This paper introduces a zero-shot visual question answering method leveraging knowledge graphs and a mask-based learning mechanism, effectively handling unseen answers and improving performance over existing models.

Contribution

It proposes a novel zero-shot VQA algorithm that integrates knowledge graphs with a mask-based learning approach, addressing answer bias and unseen answers issues.

Findings

01

Achieves state-of-the-art zero-shot VQA performance on unseen answers.

02

Significantly improves existing end-to-end models on the F-VQA dataset.

03

Introduces new answer-based zero-shot splits for the F-VQA dataset.

Abstract

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning