BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
Qinzhuo Wu, Pengzhi Gao, Wei Liu, Jian Luan

TL;DR
BacktrackAgent introduces a backtracking mechanism with error detection and recovery modules to improve GUI agent task success rates and robustness, addressing limitations of existing approaches.
Contribution
The paper presents a novel backtracking framework with verifier, judger, and reflector modules, along with a specialized training dataset for enhanced GUI agent performance.
Findings
Improved task success rate on Mobile3M and Auto-UI benchmarks.
Enhanced step accuracy through error detection and recovery.
Effective backtracking mechanism reduces failure cases.
Abstract
Graphical User Interface (GUI) agents have gained substantial attention due to their impressive capabilities to complete tasks through multiple interactions within GUI environments. However, existing agents primarily focus on enhancing the accuracy of individual actions and often lack effective mechanisms for detecting and recovering from errors. To address these shortcomings, we propose the BacktrackAgent, a robust framework that incorporates a backtracking mechanism to improve task completion efficiency. BacktrackAgent includes verifier, judger, and reflector components as modules for error detection and recovery, while also applying judgment rewards to further enhance the agent's performance. Additionally, we develop a training dataset specifically designed for the backtracking mechanism, which considers the outcome pages after action executions. Experimental results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsContext-Aware Activity Recognition Systems
MethodsSoftmax · Attention Is All You Need · Focus
