ScaleTrack: Scaling and back-tracking Automated GUI Agents

Jing Huang; Zhixiong Zeng; Wenkang Han; Yufeng Zhong; Liming Zheng,; Shuai Fu; Jingyuan Chen; Lin Ma

arXiv:2505.00416·cs.AI·May 2, 2025

ScaleTrack: Scaling and back-tracking Automated GUI Agents

Jing Huang, Zhixiong Zeng, Wenkang Han, Yufeng Zhong, Liming Zheng,, Shuai Fu, Jingyuan Chen, Lin Ma

PDF

TL;DR

ScaleTrack introduces a novel training framework for automated GUI agents that enhances grounding and planning by scaling data and incorporating backtracking of historical actions, leading to improved task execution.

Contribution

It proposes a unified training approach that scales GUI grounding data and integrates backtracking planning to better model GUI environment evolution.

Findings

01

Improved GUI grounding accuracy

02

Enhanced planning with historical backtracking

03

Effective in complex GUI task scenarios

Abstract

Automated GUI agents aims to facilitate user interaction by automatically performing complex tasks in digital environments, such as web, mobile, desktop devices. It receives textual task instruction and GUI description to generate executable actions (\emph{e.g.}, click) and operation boxes step by step. Training a GUI agent mainly involves grounding and planning stages, in which the GUI grounding focuses on finding the execution coordinates according to the task, while the planning stage aims to predict the next action based on historical actions. However, previous work suffers from the limitations of insufficient training data for GUI grounding, as well as the ignorance of backtracking historical behaviors for GUI planning. To handle the above challenges, we propose ScaleTrack, a training framework by scaling grounding and backtracking planning for automated GUI agents. We carefully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.