Rationale-guided Prompting for Knowledge-based Visual Question Answering
Zhongjian Hu, Peng Yang, Bing Li, Fengyuan Liu

TL;DR
This paper introduces PLRH, a prompting framework that guides Large Language Models to generate intermediate rationales via Chain of Thought prompting, significantly improving accuracy in knowledge-based Visual Question Answering tasks.
Contribution
The paper proposes a novel rationale-guided prompting method, PLRH, which enhances LLMs' reasoning capabilities for VQA by incorporating intermediate thought processes.
Findings
PLRH outperforms baselines by over 2.2 on OK-VQA.
PLRH outperforms baselines by over 2.1 on A-OKVQA.
Intermediate rationales improve answer accuracy in knowledge-based VQA.
Abstract
Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling
