Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training

Tianle Xia; Ming Xu; Lingxiang Hu; Yiding Sun; Wenwei Li; Linfang Shang; Liqun Liu; Peng Shu; Huan Yu; Jie Jiang

arXiv:2602.22576·cs.CL·February 27, 2026

Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training

Tianle Xia, Ming Xu, Lingxiang Hu, Yiding Sun, Wenwei Li, Linfang Shang, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang

PDF

Open Access

TL;DR

Search-P1 introduces a path-centric reward shaping framework for agentic RAG training, improving learning efficiency and reasoning quality in large language models by utilizing structured reward signals and reference paths.

Contribution

It proposes a novel reward shaping method that leverages path structure and reference planning to enhance agentic RAG training efficiency and reasoning accuracy.

Findings

01

Achieves 7.7 points average accuracy improvement on QA benchmarks.

02

Effectively extracts learning signals from failed samples.

03

Enhances training stability and reasoning quality.

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, yet traditional single-round retrieval struggles with complex multi-step reasoning. Agentic RAG addresses this by enabling LLMs to dynamically decide when and what to retrieve, but current RL-based training methods suffer from sparse outcome rewards that discard intermediate signals and low sample efficiency where failed samples contribute nothing. We propose Search-P1, a framework that introduces path-centric reward shaping for agentic RAG training, comprising two key components: (1) Path-Centric Reward, which evaluates the structural quality of reasoning trajectories through order-agnostic step coverage and soft scoring that extracts learning signals even from failed samples, and (2) Dual-Track Path Scoring with offline-generated reference planners that assesses paths from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques