Discovering Process-Outcome Credit in Multi-Step LLM Reasoning

Xiangwei Wang; Wei Wang; Ken Chen; Nanduni Nimalsiri; and Saman Halgamuge

arXiv:2602.01034·cs.AI·February 3, 2026

Discovering Process-Outcome Credit in Multi-Step LLM Reasoning

Xiangwei Wang, Wei Wang, Ken Chen, Nanduni Nimalsiri, and Saman Halgamuge

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning framework for large language models that provides continuous, process-oriented rewards to improve reasoning accuracy, efficiency, and robustness across various benchmarks.

Contribution

It proposes a Step-wise Marginal Information Gain mechanism, a Decoupled Masking Strategy, and a Dual-Gated SFT objective to enhance reasoning in LLMs with continuous rewards and disentangled credit assignment.

Findings

01

Outperforms baselines like GRPO in sample efficiency and accuracy

02

Demonstrates superior out-of-distribution robustness

03

Shows promising zero-shot transfer to unseen tasks

Abstract

Reinforcement Learning (RL) serves as a potent paradigm for enhancing reasoning capabilities in Large Language Models (LLMs), yet standard outcome-based approaches often suffer from reward sparsity and inefficient credit assignment. In this paper, we propose a novel framework designed to provide continuous reward signals, which introduces a Step-wise Marginal Information Gain (MIG) mechanism that quantifies the intrinsic value of reasoning steps against a Monotonic Historical Watermark, effectively filtering out training noise. To ensure disentangled credit distribution, we implement a Decoupled Masking Strategy, applying process-oriented rewards specifically to the chain-of-thought (CoT) and outcome-oriented rewards to the full completion. Additionally, we incorporate a Dual-Gated SFT objective to stabilize training with high-quality structural and factual signals. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications