M$^3$Searcher: Modular Multimodal Information Seeking Agency with Retrieval-Oriented Reasoning

Xiaohan Yu; Chao Feng; Lang Mei; Chong Chen

arXiv:2601.09278·cs.AI·January 15, 2026

M$^3$Searcher: Modular Multimodal Information Seeking Agency with Retrieval-Oriented Reasoning

Xiaohan Yu, Chao Feng, Lang Mei, Chong Chen

PDF

Open Access

TL;DR

M$^3$Searcher is a modular multimodal agent that improves information retrieval and reasoning across complex tasks by decoupling acquisition from answer derivation and using a retrieval-focused reward system.

Contribution

It introduces a novel modular architecture for multimodal search, along with a new dataset and training method to enhance reasoning and retrieval fidelity.

Findings

01

Outperforms existing multimodal search approaches

02

Shows strong transferability to new tasks

03

Demonstrates effective reasoning in complex multimodal scenarios

Abstract

Recent advances in DeepResearch-style agents have demonstrated strong capabilities in autonomous information acquisition and synthesize from real-world web environments. However, existing approaches remain fundamentally limited to text modality. Extending autonomous information-seeking agents to multimodal settings introduces critical challenges: the specialization-generalization trade-off that emerges when training models for multimodal tool-use at scale, and the severe scarcity of training data capturing complex, multi-step multimodal search trajectories. To address these challenges, we propose M $^{3}$ Searcher, a modular multimodal information-seeking agent that explicitly decouples information acquisition from answer derivation. M $^{3}$ Searcher is optimized with a retrieval-oriented multi-objective reward that jointly encourages factual accuracy, reasoning soundness, and retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Information Retrieval and Search Behavior