GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
Bin Xie, Rui Shao, Gongwei Chen, Kaiwen Zhou, Yinchuan Li, Jie Liu, Min Zhang, Liqiang Nie

TL;DR
GUI-explorer is a training-free GUI agent that autonomously explores and mines transition-aware knowledge from GUIs, significantly improving app interaction success rates without requiring parameter updates.
Contribution
It introduces a novel, training-free approach combining autonomous exploration and unsupervised knowledge mining for GUI agents, eliminating the need for costly fine-tuning.
Findings
Achieves 53.7% success on SPA-Bench
Achieves 47.4% success on AndroidWorld
Outperforms state-of-the-art agents significantly
Abstract
GUI automation faces critical challenges in dynamic environments. MLLMs suffer from two key issues: misinterpreting UI components and outdated knowledge. Traditional fine-tuning methods are costly for app-specific knowledge updates. We propose GUI-explorer, a training-free GUI agent that incorporates two fundamental mechanisms: (1) Autonomous Exploration of Function-aware Trajectory. To comprehensively cover all application functionalities, we design a Function-aware Task Goal Generator that automatically constructs exploration goals by analyzing GUI structural information (e.g., screenshots and activity hierarchies). This enables systematic exploration to collect diverse trajectories. (2) Unsupervised Mining of Transition-aware Knowledge. To establish precise screen-operation logic, we develop a Transition-aware Knowledge Extractor that extracts effective screen-operation logic through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPersonal Information Management and User Behavior · Spreadsheets and End-User Computing · Context-Aware Activity Recognition Systems
