SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

Xuan Wang; Siyuan Su; Quantong Fu; Yongxiang Hu; Yangfan Zhou

arXiv:2601.18305·cs.CV·January 27, 2026

SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

Xuan Wang, Siyuan Su, Quantong Fu, Yongxiang Hu, Yangfan Zhou

PDF

Open Access

TL;DR

SwipeGen introduces a novel pipeline for synthesizing human-like swipe gestures to improve GUI agent execution, and presents a benchmark and a new agent that significantly outperform existing methods in swipe accuracy.

Contribution

The paper proposes SwipeGen for swipe synthesis, creates the first benchmark for swipe execution, and develops GUISwiper with substantially improved accuracy.

Findings

01

GUISwiper achieves 69.07% swipe accuracy.

02

214% improvement over existing baselines.

03

First benchmark for swipe execution capability.

Abstract

With the widespread adoption of Graphical User Interface (GUI) agents for automating GUI interaction tasks, substantial research focused on improving GUI perception to ground task instructions into concrete action steps. However, the step execution capability of these agents has gradually emerged as a new bottleneck for task completion. In particular, existing GUI agents often adopt overly simplified strategies for handling swipe interactions, preventing them from accurately replicating human-like behavior. To address this limitation, we decompose human swipe gestures into multiple quantifiable dimensions and propose an automated pipeline SwipeGen to synthesize human-like swipe interactions through GUI exploration. Based on this pipeline, we construct and release the first benchmark for evaluating the swipe execution capability of GUI agents. Furthermore, leveraging the synthesized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Interactive and Immersive Displays · Robot Manipulation and Learning