ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

Alberto Compagnoni; Marco Morini; Sara Sarto; Federico Cocchi; Davide Caffagni; Marcella Cornia; Lorenzo Baraldi; Rita Cucchiara

arXiv:2511.22715·cs.CV·April 1, 2026

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering

Alberto Compagnoni, Marco Morini, Sara Sarto, Federico Cocchi, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

PDF

1 Repo 3 Models

TL;DR

ReAG is a novel multimodal retrieval-augmented model that improves knowledge-based visual question answering by combining multi-stage retrieval, a critic filter, and reinforcement learning for better reasoning and accuracy.

Contribution

It introduces ReAG, a reasoning-augmented retrieval approach that enhances knowledge-based VQA through multi-stage retrieval, filtering, and reinforcement learning-based training.

Findings

01

ReAG outperforms prior methods on Encyclopedic-VQA and InfoSeek datasets.

02

ReAG improves answer accuracy and interpretability by grounding responses in retrieved evidence.

03

The multi-stage retrieval and critic filtering significantly reduce irrelevant information in answers.

Abstract

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual Question Answering (VQA). However, even state-of-the-art MLLMs struggle with domain-specific or knowledge-intensive queries, where relevant information is underrepresented in pre-training data. Knowledge-based VQA (KB-VQA) addresses this by retrieving external documents to condition answer generation, but current retrieval-augmented approaches suffer from low precision, noisy passages, and limited reasoning. To address this, we propose ReAG, a novel Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages, ensuring high-quality additional context. The model follows a multi-stage training strategy leveraging reinforcement learning to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aimagelab/ReAG
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.