Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

Wenjie Tang; Minne Li; Sijie Huang; Liquan Xiao; Yuan Zhou

arXiv:2605.20061·cs.CL·May 20, 2026

Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

Wenjie Tang, Minne Li, Sijie Huang, Liquan Xiao, Yuan Zhou

PDF

1 Repo

TL;DR

ReBel is a reinforcement learning algorithm that models structured belief states to improve long-horizon decision-making in partially observable environments, enhancing success rates and sample efficiency.

Contribution

ReBel introduces belief-consistency supervision and belief-aware grouping, enabling better credit assignment without external annotations in long-horizon RL tasks.

Findings

01

ReBel improves task success by up to 20.4 percentage points.

02

ReBel increases sample efficiency by 2.1 times.

03

ReBel outperforms episode-level baselines on challenging benchmarks.

Abstract

Reinforcement learning from verifiable rewards (RLVR) is a promising paradigm for improving large language model (LLM) agents on long-horizon interactive tasks. However, in partially observable environments, incomplete observations cause agent beliefs to drift over time, while delayed rewards obscure the causal impact of intermediate decisions, exacerbating temporal credit assignment challenges. To address this, we propose ReBel (Reward Belief), a process-level reinforcement learning algorithm that explicitly models structured belief states to summarize interaction history and guide subsequent policy learning. ReBel introduces belief-consistency supervision, converting discrepancies between predicted beliefs and observed feedback into dense self-supervised signals without requiring external step-wise annotations or verifiers. It also employs belief-aware grouping to compare trajectories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Fateyetian/Rebel.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.