Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models
Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen,, QianYing Wang, Yaqiang Wu, Guang Dai, Ping Chen

TL;DR
This paper introduces DKA, a training-free framework that improves knowledge-based visual question answering by disentangling knowledge acquisition and leveraging LLM feedback to generate more precise answers.
Contribution
DKA is a novel, training-free approach that decomposes complex questions into simpler sub-questions for better knowledge retrieval and answer accuracy in KVQA.
Findings
DKA significantly outperforms state-of-the-art models on benchmark datasets.
Disentangling knowledge acquisition reduces confusion and improves answer precision.
Using LLM feedback to guide knowledge retrieval enhances overall performance.
Abstract
Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acquiring different kinds of knowledge in a coupled manner may confuse models and hinder them from retrieving precise knowledge. Furthermore, the ``forward-only'' answering process fails to explicitly capture the knowledge needs of LLMs, which can further hurt answering quality. To cope with the above limitations, we propose DKA: Disentangled Knowledge Acquisition from LLM feedback, a training-free framework that disentangles knowledge acquisition to avoid confusion and uses LLM's feedback to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsFocus · ALIGN · Balanced Selection
