AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

Zhiheng Xi; Chenyang Liao; Guanyu Li; Yajie Yang; Wenxiang Chen; Zhihao Zhang; Binghai Wang; Senjie Jin; Yuhao Zhou; Jian Guan; Wei Wu; Tao Ji; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2511.08325·cs.CL·November 12, 2025

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

Zhiheng Xi, Chenyang Liao, Guanyu Li, Yajie Yang, Wenxiang Chen, Zhihao Zhang, Binghai Wang, Senjie Jin, Yuhao Zhou, Jian Guan, Wei Wu, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access

TL;DR

This paper introduces AgentPRM, a process reward model for LLM agents that evaluates decision progress and guides multi-turn tasks, improving efficiency and robustness without extensive prompt engineering.

Contribution

We propose a novel process reward model tailored for agent tasks, capturing decision interdependence and progress, with a scalable TD-based training method that enhances efficiency.

Findings

01

AgentPRM is over 8 times more compute-efficient than baselines.

02

AgentPRM shows robust performance improvements when scaling test-time compute.

03

Applying AgentPRM to reinforcement learning offers additional insights.

Abstract

Despite rapid development, large language models (LLMs) still encounter challenges in multi-turn decision-making tasks (i.e., agent tasks) like web shopping and browser navigation, which require making a sequence of intelligent decisions based on environmental feedback. Previous work for LLM agents typically relies on elaborate prompt engineering or fine-tuning with expert trajectories to improve performance. In this work, we take a different perspective: we explore constructing process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process. Unlike LLM reasoning, where each step is scored based on correctness, actions in agent tasks do not have a clear-cut correctness. Instead, they should be evaluated based on their proximity to the goal and the progress they have made. Building on this insight, we propose a re-defined PRM for agent tasks, named…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)