Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling

Jiaxuan Wang; Yulan Hu; Wenjin Yang; Zheng Pan; Xin Li; Lan-Zhe Guo

arXiv:2604.08178·cs.AI·May 12, 2026

Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling

Jiaxuan Wang, Yulan Hu, Wenjin Yang, Zheng Pan, Xin Li, Lan-Zhe Guo

PDF

1 Datasets

TL;DR

This paper introduces Plan-RewardBench, a benchmark for evaluating reward models in complex, tool-using agent scenarios, highlighting current challenges and failure modes in trajectory-level reward assessment.

Contribution

It provides a new benchmark for trajectory-level reward modeling in agentic systems, including diverse task families and diagnostic analyses of model performance.

Findings

01

All evaluated reward models struggle with long-horizon trajectories.

02

Performance drops significantly on complex planning and error recovery tasks.

03

Current models face substantial challenges in trajectory-level reward evaluation.

Abstract

In classical Reinforcement Learning from Human Feedback (RLHF), Reward Models (RMs) serve as the fundamental signal provider for model alignment. As Large Language Models evolve into agentic systems capable of autonomous tool invocation and complex reasoning, the paradigm of reward modeling faces unprecedented challenges -- most notably, the lack of benchmarks specifically designed to assess RM capabilities within tool-integrated environments. To address this gap, we present Plan-RewardBench, a trajectory-level preference benchmark designed to evaluate how well judges distinguish preferred versus distractor agent trajectories in complex tool-using scenarios. Plan-RewardBench covers four representative task families -- (i) Safety Refusal, (ii) Tool-Irrelevance / Unavailability, (iii) Complex Planning, and (iv) Robust Error Recovery -- comprising validated positive trajectories and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

wyy1112/Plan-RewardBench
dataset· 534 dl
534 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.