MMSearch-R1: Incentivizing LMMs to Search

Jinming Wu; Zihao Deng; Wei Li; Yiding Liu; Bo You; Bo Li; Zejun Ma; Ziwei Liu

arXiv:2506.20670·cs.CV·June 26, 2025

MMSearch-R1: Incentivizing LMMs to Search

Jinming Wu, Zihao Deng, Wei Li, Yiding Liu, Bo You, Bo Li, Zejun Ma, Ziwei Liu

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces MMSearch-R1, an end-to-end reinforcement learning framework that enables large multimodal models to perform efficient, on-demand multi-turn searches in real-world internet environments, improving search efficiency and performance.

Contribution

The paper presents the first reinforcement learning framework for multimodal search, integrating image and text tools, and introduces a new multimodal search VQA dataset for training and evaluation.

Findings

01

Outperforms RAG-based baselines of the same size.

02

Matches larger RAG models' performance with 30% fewer search calls.

03

Provides insights into efficient multimodal search behavior.

Abstract

Robust deployment of large multimodal models (LMMs) in real-world scenarios requires access to external knowledge sources, given the complexity and dynamic nature of real-world information. Existing approaches such as retrieval-augmented generation (RAG) and prompt engineered search agents rely on rigid pipelines, often leading to inefficient or excessive search behaviors. We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables LMMs to perform on-demand, multi-turn search in real-world Internet environments. Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty. To support training, We collect a multimodal search VQA dataset through a semi-automated pipeline that covers diverse visual and textual knowledge needs and curate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evolvinglmms-lab/multimodal-search-r1
pytorchOfficial

Models

Datasets

lmms-lab/FVQA
dataset· 207 dl
207 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · semigroups and automata theory