OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, Tianyu Pang

TL;DR
OpenSearch-VL provides an open-source framework for training advanced multimodal search agents using curated data, diverse tools, and a novel training algorithm, significantly improving performance on multiple benchmarks.
Contribution
It introduces a comprehensive open-source recipe including data pipelines, tool environments, and a new training algorithm for reproducible multimodal search agents.
Findings
Achieved over 10-point average improvements across seven benchmarks.
Developed high-quality datasets for supervised and reinforcement learning.
Matched proprietary models' performance on several tasks.
Abstract
Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
