Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation

Chunyi Peng; Zhipeng Xu; Zhenghao Liu; Yishan Li; Yukun Yan; Shuo Wang; Yu Gu; Minghe Yu; Ge Yu; Maosong Sun

arXiv:2505.22095·cs.CL·April 7, 2026

Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation

Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Ge Yu, Maosong Sun

PDF

2 Repos 3 Models 1 Datasets

TL;DR

The paper introduces MoRE, a framework enabling multimodal models to dynamically interact with multiple retrieval experts for improved knowledge exploitation and reasoning, demonstrated by significant performance gains.

Contribution

MoRE allows MLLMs to learn to selectively engage with diverse retrieval experts based on reasoning needs, enhancing knowledge utilization and reasoning accuracy.

Findings

01

Achieves over 7% performance improvement on QA benchmarks.

02

Demonstrates effective dynamic coordination of heterogeneous retrieval experts.

03

Validates robustness and adaptability in reasoning-driven knowledge retrieval.

Abstract

Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge. However, existing methods typically adhere to rigid retrieval paradigms by mimicking fixed retrieval trajectories and thus fail to fully exploit the knowledge of different retrieval experts through dynamic interaction based on the model's knowledge needs or evolving reasoning states. To overcome this limitation, we introduce Mixture-of-Retrieval Experts (MoRE), a novel framework that enables MLLMs to collaboratively interact with diverse retrieval experts for more effective knowledge exploitation. Specifically, MoRE learns to dynamically determine which expert to engage with, conditioned on the evolving reasoning state. To effectively train this capability, we propose Stepwise Group Relative Policy Optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

hmhm1229/enwiki-20241020
dataset· 115 dl
115 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.