R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation

Zhuohong Chen; Zhengxian Wu; Zirui Liao; Shenao Jiang; Hangrui Xu; Yang Chen; Chaokui Su; Xiaoyu Liu; Haoqian Wang

arXiv:2602.00104·cs.CV·April 8, 2026

R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation

Zhuohong Chen, Zhengxian Wu, Zirui Liao, Shenao Jiang, Hangrui Xu, Yang Chen, Chaokui Su, Xiaoyu Liu, Haoqian Wang

PDF

1 Repo

TL;DR

R3G is a modular framework that enhances vision-centric answer generation in VQA by combining reasoning, retrieval, and reranking to select and utilize visual evidence effectively, achieving state-of-the-art results.

Contribution

The paper introduces R3G, a novel reasoning-retrieval-reranking framework that improves image selection and integration in vision-based question answering models.

Findings

01

R3G improves accuracy across multiple backbones and scenarios.

02

Sufficiency-aware reranking and reasoning are complementary.

03

Achieves state-of-the-art performance on MRAG-Bench.

Abstract

Vision-centric retrieval for VQA requires retrieving images to supply missing visual cues and integrating them into the reasoning process. However, selecting the right images and integrating them effectively into the model's reasoning remains challenging.To address this challenge, we propose R3G, a modular Reasoning-Retrieval-Reranking framework.It first produces a brief reasoning plan that specifies the required visual cues, then adopts a two-stage strategy, with coarse retrieval followed by fine-grained reranking, to select evidence images.On MRAG-Bench, R3G improves accuracy across six MLLM backbones and nine sub-scenarios, achieving state-of-the-art overall performance. Ablations show that sufficiency-aware reranking and reasoning steps are complementary, helping the model both choose the right images and use them well. We release code and data at https://github.com/czh24/R3G.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

czh24/R3G
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.