Open-Set Knowledge-Based Visual Question Answering with Inference Paths

Jingru Gan; Xinzhe Han; Shuhui Wang; Qingming Huang

arXiv:2310.08148·cs.LG·October 13, 2023·1 cites

Open-Set Knowledge-Based Visual Question Answering with Inference Paths

Jingru Gan, Xinzhe Han, Shuhui Wang, Qingming Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces GATHER, a novel retriever-ranker framework for open-set knowledge-based visual question answering that provides explicit reasoning paths and handles questions beyond preset answer corpora.

Contribution

It proposes a new graph-based retrieval and ranking method that enables explainable reasoning paths in open-set KB-VQA, overcoming classification limitations.

Findings

01

Achieves open-set question answering with explicit reasoning paths.

02

Outperforms existing models on real-world KB-VQA tasks.

03

Provides a new dataset, ConceptVQA, with annotated reasoning paths.

Abstract

Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JingruG/GATHER
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsBalanced Selection