Exploring Reasoning Reward Model for Agents

Kaixuan Fan; Kaituo Feng; Manyuan Zhang; Tianshuo Peng; Zhixun Li; Yilei Jiang; Shuang Chen; Peng Pei; Xunliang Cai; Xiangyu Yue

arXiv:2601.22154·cs.AI·April 29, 2026

Exploring Reasoning Reward Model for Agents

Kaixuan Fan, Kaituo Feng, Manyuan Zhang, Tianshuo Peng, Zhixun Li, Yilei Jiang, Shuang Chen, Peng Pei, Xunliang Cai, Xiangyu Yue

PDF

2 Repos 2 Models 4 Datasets

TL;DR

This paper introduces Agent-RRM, a multi-faceted reasoning reward model that provides structured feedback to improve agentic reinforcement learning, leading to significant performance improvements across multiple benchmarks.

Contribution

The paper proposes a novel reasoning reward model with explicit reasoning traces and critique, and demonstrates its effectiveness through three integration strategies in diverse benchmarks.

Findings

01

Reagent-U achieves 43.7% on GAIA and 46.2% on WebWalkerQA.

02

Structured feedback improves agent reasoning and performance.

03

Extensive evaluations validate the effectiveness of the proposed reward model.

Abstract

Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use. However, most methods still relies on sparse outcome-based reward for training. Such feedback fails to differentiate intermediate reasoning quality, leading to suboptimal training results. In this paper, we introduce Agent Reasoning Reward Model (Agent-RRM), a multi-faceted reward model that produces structured feedback for agentic trajectories, including (1) an explicit reasoning trace , (2) a focused critique that provides refinement guidance by highlighting reasoning flaws, and (3) an overall score that evaluates process performance. Leveraging these signals, we systematically investigate three integration strategies: Reagent-C (text-augmented refinement), Reagent-R (reward-augmented guidance), and Reagent-U (unified feedback integration). Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.