Rationale-guided Prompting for Knowledge-based Visual Question Answering

Zhongjian Hu; Peng Yang; Bing Li; Fengyuan Liu

arXiv:2412.16936·cs.CL·August 8, 2025

Rationale-guided Prompting for Knowledge-based Visual Question Answering

Zhongjian Hu, Peng Yang, Bing Li, Fengyuan Liu

PDF

Open Access

TL;DR

This paper introduces PLRH, a prompting framework that guides Large Language Models to generate intermediate rationales via Chain of Thought prompting, significantly improving accuracy in knowledge-based Visual Question Answering tasks.

Contribution

The paper proposes a novel rationale-guided prompting method, PLRH, which enhances LLMs' reasoning capabilities for VQA by incorporating intermediate thought processes.

Findings

01

PLRH outperforms baselines by over 2.2 on OK-VQA.

02

PLRH outperforms baselines by over 2.1 on A-OKVQA.

03

Intermediate rationales improve answer accuracy in knowledge-based VQA.

Abstract

Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling