CoSearch: Joint Training of Reasoning and Document Ranking via Reinforcement Learning for Agentic Search
Hansi Zeng, Liam Collins, Bhuvesh Kumar, Neil Shah, Hamed Zamani

TL;DR
CoSearch introduces a joint training framework for reasoning agents and document rankers using reinforcement learning, significantly improving complex question answering performance by optimizing retrieval and reasoning together.
Contribution
The paper presents a novel method for jointly training reasoning agents and document retrieval models via reinforcement learning, addressing the bottleneck of fixed retrieval systems.
Findings
Consistent improvements across seven QA benchmarks.
Semantic grouping strategy enables effective training without extra rollouts.
Joint training outperforms fixed retrieval baselines.
Abstract
Agentic search -- the task of training agents that iteratively reason, issue queries, and synthesize retrieved information to answer complex questions -- has achieved remarkable progress through reinforcement learning (RL). However, existing approaches such as Search-R1, treat the retrieval system as a fixed tool, optimizing only the reasoning agent while the retrieval component remains unchanged. A preliminary experiment reveals that the gap between an oracle and a fixed retrieval system reaches up to +26.8% relative F1 improvement across seven QA benchmarks, suggesting that the retrieval system is a key bottleneck in scaling agentic search performance. Motivated by this finding, we propose CoSearch, a framework that jointly trains a multi-step reasoning agent and a generative document ranking model via Group Relative Policy Optimization (GRPO). To enable effective GRPO training for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
