Prophet: Prompting Large Language Models with Complementary Answer   Heuristics for Knowledge-based Visual Question Answering

Zhou Yu; Xuecheng Ouyang; Zhenwei Shao; Meng Wang; Jun Yu

arXiv:2303.01903·cs.CV·April 30, 2025·6 cites

Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering

Zhou Yu, Xuecheng Ouyang, Zhenwei Shao, Meng Wang, Jun Yu

PDF

Open Access 1 Repo

TL;DR

Prophet enhances knowledge-based visual question answering by prompting large language models with answer heuristics derived from a trained VQA model, significantly improving accuracy across multiple datasets.

Contribution

Introducing Prophet, a flexible framework that combines answer heuristics from a VQA model with LLM prompting, advancing knowledge-based VQA performance.

Findings

01

Outperforms state-of-the-art on four datasets

02

Effective with various VQA models and LLMs

03

Can be integrated with multimodal models for further gains

Abstract

Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have resorted to using a powerful large language model (LLM) as an implicit knowledge engine to acquire the necessary knowledge for answering. Despite the encouraging results achieved by these methods, we argue that they have not fully activated the capacity of the \emph{blind} LLM as the provided textual input is insufficient to depict the required visual information to answer the question. In this paper, we present Prophet -- a conceptually simple, flexible, and general framework designed to prompt LLM with answer heuristics for knowledge-based VQA. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

milvlg/prophet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · Cosine Annealing · Linear Warmup With Cosine Annealing · Layer Normalization · Residual Connection · {Dispute@FaQ-s}How to file a dispute with Expedia?