Loading paper
Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training | Tomesphere