SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis
Xuan Wang, Siyuan Su, Quantong Fu, Yongxiang Hu, Yangfan Zhou

TL;DR
SwipeGen introduces a novel pipeline for synthesizing human-like swipe gestures to improve GUI agent execution, and presents a benchmark and a new agent that significantly outperform existing methods in swipe accuracy.
Contribution
The paper proposes SwipeGen for swipe synthesis, creates the first benchmark for swipe execution, and develops GUISwiper with substantially improved accuracy.
Findings
GUISwiper achieves 69.07% swipe accuracy.
214% improvement over existing baselines.
First benchmark for swipe execution capability.
Abstract
With the widespread adoption of Graphical User Interface (GUI) agents for automating GUI interaction tasks, substantial research focused on improving GUI perception to ground task instructions into concrete action steps. However, the step execution capability of these agents has gradually emerged as a new bottleneck for task completion. In particular, existing GUI agents often adopt overly simplified strategies for handling swipe interactions, preventing them from accurately replicating human-like behavior. To address this limitation, we decompose human swipe gestures into multiple quantifiable dimensions and propose an automated pipeline SwipeGen to synthesize human-like swipe interactions through GUI exploration. Based on this pipeline, we construct and release the first benchmark for evaluating the swipe execution capability of GUI agents. Furthermore, leveraging the synthesized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Interactive and Immersive Displays · Robot Manipulation and Learning
