GUITester: Enabling GUI Agents for Exploratory Defect Discovery
Yifei Gao, Jiang Wu, Xiaoyi Chen, Yifan Yang, Zhe Cui, Tianyi Ma, Jiaming Zhang, Jitao Sang

TL;DR
GUITester is a novel multi-agent framework that autonomously performs exploratory GUI testing by addressing key challenges, significantly improving defect discovery accuracy over existing methods.
Contribution
The paper introduces GUITestBench, the first interactive benchmark for GUI defect discovery, and GUITester, a multi-agent system that enhances autonomous exploratory testing capabilities.
Findings
GUITester achieves an F1-score of 48.90% (Pass@3), outperforming baselines.
GUITestBench includes 143 tasks across 26 defect types.
The framework effectively decouples navigation from defect verification.
Abstract
Exploratory GUI testing is essential for software quality but suffers from high manual costs. While Multi-modal Large Language Model (MLLM) agents excel in navigation, they fail to autonomously discover defects due to two core challenges: \textit{Goal-Oriented Masking}, where agents prioritize task completion over reporting anomalies, and \textit{Execution-Bias Attribution}, where system defects are misidentified as agent errors. To address these, we first introduce \textbf{GUITestBench}, the first interactive benchmark for this task, featuring 143 tasks across 26 defects. We then propose \textbf{GUITester}, a multi-agent framework that decouples navigation from verification via two modules: (i) a \textit{Planning-Execution Module (PEM)} that proactively probes for defects via embedded testing intents, and (ii) a \textit{Hierarchical Reflection Module (HRM)} that resolves attribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
