Disentangling Knowledge-based and Visual Reasoning by Question   Decomposition in KB-VQA

Elham J. Barezi; Parisa Kordjamshidi

arXiv:2406.18839·cs.AI·June 28, 2024

Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA

Elham J. Barezi, Parisa Kordjamshidi

PDF

Open Access

TL;DR

This paper improves knowledge-based visual question answering by decomposing complex questions into simpler ones, enhancing information extraction and reasoning, leading to up to 2% accuracy gains on multiple datasets.

Contribution

It introduces a question decomposition approach that separates visual and non-visual reasoning, improving multi-hop question answering in KB-VQA tasks.

Findings

01

Decomposing questions improves accuracy on VQA datasets.

02

Using simpler questions enhances visual and knowledge reasoning.

03

Up to 2% accuracy improvement demonstrated.

Abstract

We study the Knowledge-Based visual question-answering problem, for which given a question, the models need to ground it into the visual modality to find the answer. Although many recent works use question-dependent captioners to verbalize the given image and use Large Language Models to solve the VQA problem, the research results show they are not reasonably performing for multi-hop questions. Our study shows that replacing a complex question with several simpler questions helps to extract more relevant information from the image and provide a stronger comprehension of it. Moreover, we analyze the decomposed questions to find out the modality of the information that is required to answer them and use a captioner for the visual questions and LLMs as a general knowledge source for the non-visual KB-based questions. Our results demonstrate the positive impact of using simple questions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Rough Sets and Fuzzy Logic · Bayesian Modeling and Causal Inference