A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions
Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, Thilo Stadelmann

TL;DR
This survey comprehensively reviews agents for computer use (ACUs), analyzing their current state, challenges, and future directions, with a focus on taxonomy, research gaps, and practical deployment considerations.
Contribution
It introduces a unifying taxonomy for ACUs, reviews 87 systems and 33 datasets, and identifies key research gaps and future directions for practical, general-purpose ACUs.
Findings
Identified six major research gaps including generalization and evaluation issues.
Reviewed 87 ACUs and 33 datasets across different approaches.
Proposed strategies to improve ACU robustness and real-world applicability.
Abstract
Agents for computer use (ACUs) are an emerging class of systems capable of executing complex tasks on digital devices -- such as desktops, mobile phones, and web platforms -- given instructions in natural language. These agents can automate tasks by controlling software via low-level actions like mouse clicks and touchscreen gestures. However, despite rapid progress, ACUs are not yet mature for everyday use. In this survey, we investigate the state-of-the-art, trends, and research gaps in the development of practical ACUs. We provide a comprehensive review of the ACU landscape, introducing a unifying taxonomy spanning three dimensions: (I) the domain perspective, characterizing agent operating contexts; (II) the interaction perspective, describing observation modalities (e.g., screenshots, HTML) and action modalities (e.g., mouse, keyboard, code execution); and (III) the agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
