Beyond Clicking:A Step Towards Generalist GUI Grounding via Text Dragging
Zeyi Liao, Yadong Lu, Boyu Gou, Huan Sun, Ahmed Awadallah

TL;DR
This paper introduces a new dataset and benchmark for GUI text dragging tasks, advancing the development of generalist GUI grounding models that go beyond simple clicking actions.
Contribution
The paper presents GUI-Drag, a large-scale dataset for text dragging, and ScreenDrag, a benchmark for evaluating dragging capabilities, supporting broader GUI grounding research.
Findings
Models trained on GUI-Drag improve dragging performance on ScreenDrag.
Continual training enhances dragging ability without losing click-based performance.
Open-sourced resources facilitate further research in GUI grounding.
Abstract
Graphical user interface (GUI) grounding, the process of mapping human instructions to GUI actions, serves as a fundamental basis to autonomous GUI agents. While existing grounding models achieve promising performance to simulate the mouse click action on various click-based benchmarks, another essential mode of mouse interaction, namely dragging, remains largely underexplored. Yet, dragging the mouse to select and manipulate textual content represents a prevalent and important usage in practical GUI scenarios. To narrow this gap, we first introduce GUI-Drag, a diverse dataset of 161K text dragging examples synthesized through a scalable pipeline. To support systematic and robust evaluation, we further construct ScreenDrag, a benchmark with 5,333 examples spanning three levels of interface context, together with three dedicated metrics designed for assessing text dragging capability.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Multimodal Machine Learning Applications · Topic Modeling
