SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation
Xichen Zhang, Ziyi He, Yinghao Zhu, Sitong Wu, Shaozuo Yu, Meng Chu, Wenhu Zhang, Haoru Tan, Jiaya Jia

TL;DR
SearchGym introduces a high-fidelity simulation environment for training robust search agents, overcoming real-world data noise and cost issues, and demonstrating superior performance across multiple benchmarks.
Contribution
We develop SearchGym, a verifiable knowledge graph-based simulation environment, and propose SearchGym-RL, a curriculum learning approach for effective search agent training.
Findings
Qwen2.5-7B-Base trained in SearchGym outperforms baseline by 10.6% on average.
SearchGym enables scalable, cost-effective training with strong Sim-to-Real transfer.
High-fidelity simulation improves search agent robustness and performance.
Abstract
Search agents have emerged as a pivotal paradigm for solving open-ended, knowledge-intensive reasoning tasks. However, training these agents via Reinforcement Learning (RL) faces a critical dilemma: interacting with live commercial Web APIs is prohibitively expensive, while relying on static data snapshots often introduces noise due to data misalignment. This misalignment generates corrupted reward signals that destabilize training by penalizing correct reasoning or rewarding hallucination. To address this, we propose SearchGym, a simulation environment designed to bootstrap robust search agents. SearchGym employs a rigorous generative pipeline to construct a verifiable knowledge graph and an aligned document corpus, ensuring that every reasoning task is factually grounded and strictly solvable. Building on this controllable environment, we introduce SearchGym-RL, a curriculum learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Information Retrieval and Search Behavior · Topic Modeling
