SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
Shuang Sun, Huatong Song, Yuhao Wang, Ruiyang Ren, Jinhao Jiang, Junjie Zhang, Fei Bai, Jia Deng, Wayne Xin Zhao, Zheng Liu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen

TL;DR
SimpleDeepSearcher is a lightweight framework that improves deep search capabilities of large language models by synthesizing high-quality training data through realistic web search simulations, avoiding complex training and high computational costs.
Contribution
It introduces a data-centric approach using realistic web search simulations and multi-criteria data curation to enhance deep search performance without complex training paradigms.
Findings
Significant improvements over RL-based baselines on five benchmarks.
High-quality training data synthesized from only 871 samples.
Effective deep search achieved through strategic data engineering.
Abstract
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios requiring multi-step reasoning and iterative information retrieval. However, existing approaches face critical limitations that lack high-quality training trajectories or suffer from the distributional mismatches in simulated environments and prohibitive computational costs for real-world deployment. This paper introduces SimpleDeepSearcher, a lightweight yet effective framework that bridges this gap through strategic data engineering rather than complex training paradigms. Our approach synthesizes high-quality training data by simulating realistic user interactions in live web search environments, coupled with a multi-criteria curation strategy that optimizes the diversity and quality of input and output side. Experiments on five benchmarks across diverse domains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Multimodal Machine Learning Applications
MethodsShrink and Fine-Tune
