OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Shuang Chen; Kaituo Feng; Hangting Chen; Wenxuan Huang; Dasen Dai; Quanxin Shou; Yunlong Lin; Xiangyu Yue; Shenghua Gao; Tianyu Pang

arXiv:2605.05185·cs.CV·May 7, 2026

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, Tianyu Pang

PDF

1 Repo 3 Models 3 Datasets

TL;DR

OpenSearch-VL provides an open-source framework for training advanced multimodal search agents using curated data, diverse tools, and a novel training algorithm, significantly improving performance on multiple benchmarks.

Contribution

It introduces a comprehensive open-source recipe including data pipelines, tool environments, and a new training algorithm for reproducible multimodal search agents.

Findings

01

Achieved over 10-point average improvements across seven benchmarks.

02

Developed high-quality datasets for supervised and reinforcement learning.

03

Matched proprietary models' performance on several tasks.

Abstract

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shawn0728/OpenSearch-VL
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.