UI-Evol: Automatic Knowledge Evolving for Computer Use Agents
Ziyun Zhang, Xinyi Liu, Xiaoyi Zhang, Jun Wang, Gang Chen, Yan Lu

TL;DR
UI-Evol is a modular system that enhances computer use agents by evolving GUI knowledge through interaction data and external references, significantly improving task success rates and reliability.
Contribution
It introduces a novel plug-and-play knowledge evolution module with retrace and critique stages, addressing the knowledge-execution gap in computer use agents.
Findings
UI-Evol significantly improves task performance on OSWorld benchmark.
It reduces behavioral standard deviation, increasing agent reliability.
Demonstrates effectiveness over state-of-the-art Agent S2.
Abstract
External knowledge has played a crucial role in the recent development of computer use agents. We identify a critical knowledge-execution gap: retrieved knowledge often fails to translate into effective real-world task execution. Our analysis shows even 90% correct knowledge yields only 41% execution success rate. To bridge this gap, we propose UI-Evol, a plug-and-play module for autonomous GUI knowledge evolution. UI-Evol consists of two stages: a Retrace Stage that extracts faithful objective action sequences from actual agent-environment interactions, and a Critique Stage that refines existing knowledge by comparing these sequences against external references. We conduct comprehensive experiments on the OSWorld benchmark with the state-of-the-art Agent S2. Our results demonstrate that UI-Evol not only significantly boosts task performance but also addresses a previously overlooked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Personal Information Management and User Behavior
MethodsRetrace
