PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

Dongyi Liu; Yifan Niu; Qinwen Wang; Han Xiao; Jia Li

arXiv:2605.09287·cs.AI·May 13, 2026

PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

Dongyi Liu, Yifan Niu, Qinwen Wang, Han Xiao, Jia Li

PDF

1 Repo

TL;DR

PiCA introduces a pivot-based reward mechanism for search agents in reinforcement learning, effectively addressing long-horizon credit assignment challenges and improving performance on knowledge-intensive tasks.

Contribution

The paper proposes PiCA, a novel pivot-based credit assignment method that enhances reward signals by leveraging success probabilities and historical context, outperforming existing methods.

Findings

01

PiCA achieves 15.2% and 2.2% improvements on 3B and 7B models.

02

PiCA outperforms strong baselines across seven QA benchmarks.

03

PiCA maintains distributional consistency while providing dense, pivot-aware guidance.

Abstract

Large Language Model (LLM)-based search agents trained with reinforcement learning (RL) have significantly improved the performance of knowledge-intensive tasks. However, existing methods encounter critical challenges in long-horizon credit assignment: (i) Reward Sparsity, where models receive only outcome feedback without step-level guidance to differentiate action quality; (ii) Isolated Credit, where credit is assigned to steps independently, failing to capture sequential dependencies; and (iii) Distributional Shift, where rewards are estimated on templates that deviate from the model's natural generative distribution. To address these issues, we propose Pivot-Based Credit Assignment (PiCA), a novel step reward mechanism that reformulates the search trajectory as a sequential process of cumulative search progress. Unlike prior isolated step rewards, PiCA defines process rewards as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

novdream/PiCA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.