Comparing Human Oversight Strategies for Computer-Use Agents

Chaoran Chen; Zhiping Zhang; Zeya Chen; Eryue Xu; Yinuo Yang; Ibrahim Khalilov; Simret A Gebreegziabher; Yanfang Ye; Ziang Xiao; Yaxing Yao; Tianshi Li; Toby Jia-Jun Li

arXiv:2604.04918·cs.HC·April 7, 2026

Comparing Human Oversight Strategies for Computer-Use Agents

Chaoran Chen, Zhiping Zhang, Zeya Chen, Eryue Xu, Yinuo Yang, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li

PDF

TL;DR

This study compares four oversight strategies for AI agents, revealing that oversight effectiveness depends on how supervision surfaces critical moments and influences user trust and intervention success.

Contribution

It introduces a structural coordination framework for oversight strategies and empirically evaluates their impact on user interaction and trust in live web environments.

Findings

01

Oversight strategy influences exposure to problematic actions more than correction ability.

02

Plan-based strategies reduce problematic actions but do not improve intervention success.

03

No single oversight strategy is best across all subjective measures, with trust varying by context.

Abstract

LLM-powered computer-use agents (CUAs) are shifting users from direct manipulation to supervisory coordination. Existing oversight mechanisms, however, have largely been studied as isolated interface features, making broader oversight strategies difficult to compare. We conceptualize CUA oversight as a structural coordination problem defined by delegation structure and engagement level, and use this lens to compare four oversight strategies in a mixed-methods study with 48 participants in a live web environment. Our results show that oversight strategy more reliably shaped users' exposure to problematic actions than their ability to correct them once visible. Plan-based strategies were associated with lower rates of agent problematic-action occurrence, but not equally strong gains in runtime intervention success once such actions became visible. On subjective measures, no single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.