MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval

Mingjun Xu; Jinhan Dong; Jue Hou; Zehui Wang; Sihang Li; Zhifeng Gao; Renxin Zhong; Hengxing Cai

arXiv:2506.12364·cs.AI·June 24, 2025

MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval

Mingjun Xu, Jinhan Dong, Jue Hou, Zehui Wang, Sihang Li, Zhifeng Gao, Renxin Zhong, Hengxing Cai

PDF

1 Models

TL;DR

MM-R5 is a multimodal reranking model that uses reinforcement learning and reasoning chains to improve document retrieval accuracy across multiple domains, achieving state-of-the-art results.

Contribution

The paper introduces MM-R5, a novel multimodal reranker trained with a two-stage process including reasoning-focused supervised fine-tuning and reinforcement learning, enhancing retrieval precision.

Findings

01

Achieves state-of-the-art performance on MMDocIR benchmark.

02

Improves recall@1 by over 4% compared to previous methods.

03

Effectively utilizes reasoning chains and reinforcement learning for multimodal reranking.

Abstract

Multimodal document retrieval systems enable information access across text, images, and layouts, benefiting various domains like document-based question answering, report analysis, and interactive content summarization. Rerankers improve retrieval precision by reordering retrieved candidates. However, current multimodal reranking methods remain underexplored, with significant room for improvement in both training strategies and overall effectiveness. Moreover, the lack of explicit reasoning makes it difficult to analyze and optimize these methods further. In this paper, We propose MM-R5, a MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval, aiming to provide a more effective and reliable solution for multimodal reranking tasks. MM-R5 is trained in two stages: supervised fine-tuning (SFT) and reinforcement learning (RL). In the SFT stage, we focus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
i2vec/MM-R5
model· 31 dl· ♡ 6
31 dl♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.