AgentRM: Enhancing Agent Generalization with Reward Modeling

Yu Xia; Jingru Fan; Weize Chen; Siyu Yan; Xin Cong; Zhong Zhang; Yaxi; Lu; Yankai Lin; Zhiyuan Liu; Maosong Sun

arXiv:2502.18407·cs.CL·February 26, 2025

AgentRM: Enhancing Agent Generalization with Reward Modeling

Yu Xia, Jingru Fan, Weize Chen, Siyu Yan, Xin Cong, Zhong Zhang, Yaxi, Lu, Yankai Lin, Zhiyuan Liu, Maosong Sun

PDF

Open Access 1 Video

TL;DR

AgentRM introduces a reward modeling approach to improve the generalization of LLM-based agents to unseen tasks, outperforming existing methods through test-time guidance.

Contribution

The paper proposes a novel reward model, AgentRM, which guides policy models at test time, demonstrating superior generalization and task performance over prior fine-tuning methods.

Findings

01

AgentRM improves performance by 8.8 points on average across nine tasks.

02

It surpasses the top general agent by 4.0 points.

03

AgentRM shows strong generalization, with a 12.6-point gain on LLaMA-3-70B.

Abstract

Existing LLM-based agents have achieved strong performance on held-in tasks, but their generalizability to unseen tasks remains poor. Hence, some recent work focus on fine-tuning the policy model with more diverse tasks to improve the generalizability. In this work, we find that finetuning a reward model to guide the policy model is more robust than directly finetuning the policy model. Based on this finding, we propose AgentRM, a generalizable reward model, to guide the policy model for effective test-time search. We comprehensively investigate three approaches to construct the reward model, including explicit reward modeling, implicit reward modeling and LLM-as-a-judge. We then use AgentRM to guide the answer generation with Best-of-N sampling and step-level beam search. On four types of nine agent tasks, AgentRM enhances the base policy model by $8.8$ points on average, surpassing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AgentRM: Enhancing Agent Generalization with Reward Modeling· underline

Taxonomy

TopicsFuzzy Logic and Control Systems · Data Stream Mining Techniques · Reinforcement Learning in Robotics

MethodsFocus · Balanced Selection