s3: You Don't Need That Much Data to Train a Search Agent via RL

Pengcheng Jiang; Xueqiang Xu; Jiacheng Lin; Jinfeng Xiao; Zifeng Wang; Jimeng Sun; Jiawei Han

arXiv:2505.14146·cs.AI·November 6, 2025

s3: You Don't Need That Much Data to Train a Search Agent via RL

Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han

PDF

Open Access 1 Repo 1 Video

TL;DR

s3 is a lightweight, model-agnostic RL framework that trains a searcher to improve generation accuracy with minimal data, outperforming larger-data baselines across diverse QA tasks.

Contribution

It introduces s3, a novel decoupled searcher training method using a Gain Beyond RAG reward, requiring significantly less data than existing approaches.

Findings

01

s3 outperforms baselines trained on 70x more data

02

Achieves better downstream QA performance across multiple benchmarks

03

Requires only 2.4k training samples

Abstract

Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pat-jj/s3
pytorchOfficial

Videos

s3: You Don't Need That Much Data to Train a Search Agent via RL· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Artificial Intelligence in Healthcare and Education

MethodsAttention Is All You Need · Linear Warmup With Linear Decay · Softmax · Attention Dropout · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Residual Connection · Byte Pair Encoding · Weight Decay