The Art of Building Verifiers for Computer Use Agents

Corby Rosset; Pratyusha Sharma; Andrew Zhao; Miguel Gonzalez-Fernandez; Ahmed Awadallah

arXiv:2604.06240·cs.CR·April 9, 2026

The Art of Building Verifiers for Computer Use Agents

Corby Rosset, Pratyusha Sharma, Andrew Zhao, Miguel Gonzalez-Fernandez, Ahmed Awadallah

PDF

1 Repo

TL;DR

This paper introduces the Universal Verifier, a robust system for verifying web task trajectories that aligns well with human judgment, reduces false positives, and improves reliability over previous baselines.

Contribution

The paper presents a set of design principles for building effective verifiers and introduces the Universal Verifier system, validated on a new benchmark with open-source code.

Findings

01

Universal Verifier agrees with humans as often as humans agree with each other.

02

False positive rates reduced to near zero compared to baselines.

03

Auto-research agent achieves 70% of expert quality in 5% of the time.

Abstract

Verifying the success of computer use agent (CUA) trajectories is a critical challenge: without reliable verification, neither evaluation nor training signal can be trusted. In this paper, we present lessons learned from building a best-in-class verifier for web tasks we call the Universal Verifier. We design the Universal Verifier around four key principles: 1) constructing rubrics with meaningful, non-overlapping criteria to reduce noise; 2) separating process and outcome rewards that yield complementary signals, capturing cases where an agent follows the right steps but gets blocked or succeeds through an unexpected path; 3) distinguishing between controllable and uncontrollable failures scored via a cascading-error-free strategy for finer-grained failure understanding; and 4) a divide-and-conquer context management scheme that attends to all screenshots in a trajectory, improving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/fara
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.