WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents

Yao Zhang; Shijie Tang; Zeyu Li; Zhen Han; Volker Tresp

arXiv:2601.21872·cs.AI·April 10, 2026

WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents

Yao Zhang, Shijie Tang, Zeyu Li, Zhen Han, Volker Tresp

PDF

1 Repo 4 Models 3 Datasets

TL;DR

WebArbiter introduces a principle-guided, reasoning-based reward model for web agents that generates structured justifications, improving task success and robustness over existing methods.

Contribution

It presents WebArbiter, a novel WebPRM that uses text generation for reasoning and verdicts, with a two-stage training pipeline and a new benchmark for evaluation.

Findings

01

WebArbiter-7B outperforms GPT-5 by 9.1 points on WebPRMBench.

02

It surpasses prior WebPRMs by up to 6.4 points in trajectory search.

03

The model demonstrates improved robustness and interpretability in complex web tasks.

Abstract

Web agents hold great potential for automating complex computer tasks, yet their interactions involve long-horizon, sequential decision-making with irreversible actions. In such settings, outcome-based supervision is sparse and delayed, often rewarding incorrect trajectories and failing to support inference-time scaling. This motivates the use of Process Reward Models (WebPRMs) for web navigation, but existing approaches remain limited: scalar WebPRMs collapse progress into coarse, weakly grounded signals, while checklist-based WebPRMs rely on brittle template matching that fails under layout or semantic changes and often mislabels superficially correct actions as successful, providing little insight or interpretability. To address these challenges, we introduce WebArbiter, a reasoning-first, principle-inducing WebPRM that formulates reward modeling as text generation, producing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaoz720/GroundedPRM
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.