Knowledge Acquisition Disentanglement for Knowledge-based Visual   Question Answering with Large Language Models

Wenbin An; Feng Tian; Jiahao Nie; Wenkai Shi; Haonan Lin; Yan Chen,; QianYing Wang; Yaqiang Wu; Guang Dai; Ping Chen

arXiv:2407.15346·cs.CV·July 23, 2024·1 cites

Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen,, QianYing Wang, Yaqiang Wu, Guang Dai, Ping Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces DKA, a training-free framework that improves knowledge-based visual question answering by disentangling knowledge acquisition and leveraging LLM feedback to generate more precise answers.

Contribution

DKA is a novel, training-free approach that decomposes complex questions into simpler sub-questions for better knowledge retrieval and answer accuracy in KVQA.

Findings

01

DKA significantly outperforms state-of-the-art models on benchmark datasets.

02

Disentangling knowledge acquisition reduces confusion and improves answer precision.

03

Using LLM feedback to guide knowledge retrieval enhances overall performance.

Abstract

Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acquiring different kinds of knowledge in a coupled manner may confuse models and hinder them from retrieving precise knowledge. Furthermore, the ``forward-only'' answering process fails to explicitly capture the knowledge needs of LLMs, which can further hurt answering quality. To cope with the above limitations, we propose DKA: Disentangled Knowledge Acquisition from LLM feedback, a training-free framework that disentangles knowledge acquisition to avoid confusion and uses LLM's feedback to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lackel/dka
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsFocus · ALIGN · Balanced Selection