Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan, Xinzhe Han, Shuhui Wang, Qingming Huang

TL;DR
This paper introduces GATHER, a novel retriever-ranker framework for open-set knowledge-based visual question answering that provides explicit reasoning paths and handles questions beyond preset answer corpora.
Contribution
It proposes a new graph-based retrieval and ranking method that enables explainable reasoning paths in open-set KB-VQA, overcoming classification limitations.
Findings
Achieves open-set question answering with explicit reasoning paths.
Outperforms existing models on real-world KB-VQA tasks.
Provides a new dataset, ConceptVQA, with annotated reasoning paths.
Abstract
Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsBalanced Selection
