AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress
Zhiheng Xi, Chenyang Liao, Guanyu Li, Yajie Yang, Wenxiang Chen, Zhihao Zhang, Binghai Wang, Senjie Jin, Yuhao Zhou, Jian Guan, Wei Wu, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR
This paper introduces AgentPRM, a process reward model for LLM agents that evaluates decision progress and guides multi-turn tasks, improving efficiency and robustness without extensive prompt engineering.
Contribution
We propose a novel process reward model tailored for agent tasks, capturing decision interdependence and progress, with a scalable TD-based training method that enhances efficiency.
Findings
AgentPRM is over 8 times more compute-efficient than baselines.
AgentPRM shows robust performance improvements when scaling test-time compute.
Applying AgentPRM to reinforcement learning offers additional insights.
Abstract
Despite rapid development, large language models (LLMs) still encounter challenges in multi-turn decision-making tasks (i.e., agent tasks) like web shopping and browser navigation, which require making a sequence of intelligent decisions based on environmental feedback. Previous work for LLM agents typically relies on elaborate prompt engineering or fine-tuning with expert trajectories to improve performance. In this work, we take a different perspective: we explore constructing process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process. Unlike LLM reasoning, where each step is scored based on correctness, actions in agent tasks do not have a clear-cut correctness. Instead, they should be evaluated based on their proximity to the goal and the progress they have made. Building on this insight, we propose a re-defined PRM for agent tasks, named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
