Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan and, Huajun Chen

TL;DR
This paper introduces a zero-shot visual question answering method leveraging knowledge graphs and a mask-based learning mechanism, effectively handling unseen answers and improving performance over existing models.
Contribution
It proposes a novel zero-shot VQA algorithm that integrates knowledge graphs with a mask-based learning approach, addressing answer bias and unseen answers issues.
Findings
Achieves state-of-the-art zero-shot VQA performance on unseen answers.
Significantly improves existing end-to-end models on the F-VQA dataset.
Introduces new answer-based zero-shot splits for the F-VQA dataset.
Abstract
Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
