Incentivizing In-depth Reasoning over Long Contexts with Process Advantage Shaping

Miao Peng; Weizhou Shen; Nuo Chen; Chenliang Li; Ming Yan; Jia Li

arXiv:2601.12465·cs.CL·January 21, 2026

Incentivizing In-depth Reasoning over Long Contexts with Process Advantage Shaping

Miao Peng, Weizhou Shen, Nuo Chen, Chenliang Li, Ming Yan, Jia Li

PDF

Open Access

TL;DR

This paper introduces DeepReasonQA and LongPAS, novel methods that improve long-context reasoning in LLMs by synthesizing challenging data and fine-grained credit assignment, significantly enhancing performance over existing RLVR approaches.

Contribution

The paper presents a new framework for generating difficult multi-hop QA data and a process advantage shaping method that improves long-context reasoning in LLMs, addressing the 'almost-there' phenomenon.

Findings

01

Outperforms RLVR baselines on long-context reasoning benchmarks.

02

Matches frontier LLMs with fewer parameters.

03

Strengthens reasoning capabilities while maintaining stable training.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective in enhancing LLMs short-context reasoning, but its performance degrades in long-context scenarios that require both precise grounding and robust long-range reasoning. We identify the "almost-there" phenomenon in long-context reasoning, where trajectories are largely correct but fail at the final step, and attribute this failure to two factors: (1) the lack of high reasoning density in long-context QA data that push LLMs beyond mere grounding toward sophisticated multi-hop reasoning; and (2) the loss of valuable learning signals during long-context RL training due to the indiscriminate penalization of partially correct trajectories with incorrect outcomes. To overcome this bottleneck, we propose DeepReasonQA, a KG-driven synthesis framework that controllably constructs high-difficulty, multi-hop long-context QA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Machine Learning in Healthcare