Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

Alice Gao; Weixi Tong; Rishab Vempati; Katharina Reinecke; R. Benjamin Shapiro; Tianyi Zhang; Jason Wu

arXiv:2604.26020·cs.CL·April 30, 2026

Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

Alice Gao, Weixi Tong, Rishab Vempati, Katharina Reinecke, R. Benjamin Shapiro, Tianyi Zhang, Jason Wu

PDF

TL;DR

This paper introduces uxCUA, a machine learning-based computer use agent trained to assess GUI usability by simulating human-like interactions and predicting usability scores, improving accuracy over larger models.

Contribution

The work presents a novel algorithm for training CUAs to evaluate GUI usability using a large dataset, prioritizing key interaction flows and providing realistic critiques.

Findings

01

uxCUA outperforms larger models in usability assessment accuracy.

02

uxCUA produces realistic critiques of synthetic and real UIs.

03

The method offers a data-driven foundation for automated usability evaluation.

Abstract

Usability testing with experts and potential users can assess the effectiveness, efficiency, and user satisfaction of graphical user interfaces (GUIs) but doing so remains a costly and time-intensive process. Prior work has used computer use agents (CUAs) and other generative agents that can simulate user interactions and preference, but we show that agents still struggle to provide accurate usability assessments. In this work, we present a novel machine learning method that operationalizes a computational definition of usability to train CUAs to assess GUI usability by i) prioritizing important interaction flows, ii) executing them through human-like interactions, and iii) predicting a learned numerical usability score. We train a computer use agent, uxCUA, with our algorithm on a large-scale dataset of fully interactive user interfaces (UIs) paired with usability labels and human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.