Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning

Chris Samarinas; Haw-Shiuan Chang; Hamed Zamani

arXiv:2602.23440·cs.CL·April 2, 2026

Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning

Chris Samarinas, Haw-Shiuan Chang, Hamed Zamani

PDF

1 Repo

TL;DR

SLATE introduces a novel step-level sampling and dense reward approach that significantly improves retrieval-augmented reasoning in large language models by reducing variance and providing richer supervision.

Contribution

It proposes truncated step-level sampling with variance reduction and dense, decomposed process rewards, advancing step-level reinforcement learning for retrieval-augmented reasoning.

Findings

01

SLATE outperforms baselines on seven QA benchmarks.

02

Achieves 7.0% improvement over Search-R1 on 7B model.

03

Gains are largest on multi-hop tasks.

Abstract

Reinforcement learning has emerged as an effective paradigm for training large language models to interleave reasoning with search engine calls. However, existing approaches face a fundamental credit assignment problem: methods like Search-R1 assign a single outcome reward to the entire multi-step trajectory, providing no signal about which reasoning or retrieval decisions were responsible for success or failure. Process-reward methods such as StepSearch introduce step-level supervision but still sample complete trajectories independently, so advantage estimates at any given step are contaminated by the randomness of all other steps. We propose SLATE (Step-Level Advantage estimation for Truncated Exploration), which addresses both problems through two complementary ideas. First, truncated step-level sampling generates k continuations from a shared prefix, isolating all variation to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

algoprog/SLATE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.