Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement
Chao Hao, Shuai Wang, Kaiwen Zhou

TL;DR
RecAgent is an uncertainty-aware GUI agent that improves mobile task automation by adaptively focusing on relevant UI components and involving human feedback in ambiguous situations, reducing input redundancy and decision ambiguity.
Contribution
It introduces RecAgent, a novel framework combining component recommendation and human-in-the-loop refinement to handle perceptual and decision uncertainties in GUI navigation.
Findings
RecAgent significantly improves success rates in complex GUI tasks.
The proposed dataset ComplexAction effectively evaluates GUI agent performance.
Experiments demonstrate the effectiveness of adaptive perception and human-in-the-loop strategies.
Abstract
Graphical user interface (GUI) agents have shown promise in automating mobile tasks but still struggle with input redundancy and decision ambiguity. In this paper, we present \textbf{RecAgent}, an uncertainty-aware agent that addresses these issues through adaptive perception. We distinguish two types of uncertainty in GUI navigation: (1) perceptual uncertainty, caused by input redundancy and noise from comprehensive screen information, and (2) decision uncertainty, arising from ambiguous tasks and complex reasoning. To reduce perceptual uncertainty, RecAgent employs a component recommendation mechanism that identifies and focuses on the most relevant UI elements. For decision uncertainty, it uses an interactive module to request user feedback in ambiguous situations, enabling intent-aware decisions. These components are integrated into a unified framework that proactively reduces input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
