ShowUI-$\pi$: Flow-based Generative Models as GUI Dexterous Hands
Siyuan Hu, Kevin Qinghong Lin, Mike Zheng Shou

TL;DR
ShowUI-$\pi$ introduces a flow-based generative model enabling continuous, dexterous GUI manipulation like dragging, combining discrete and continuous actions, and establishing a new benchmark for evaluating GUI drag capabilities.
Contribution
The paper presents the first flow-based generative model for GUI dexterous manipulation, integrating discrete and continuous actions, and provides a new dataset and benchmark for GUI drag tasks.
Findings
ShowUI-$\pi$ outperforms existing proprietary GUI agents on the ScreenDrag benchmark.
The model achieves 26.98 score with only 450M parameters, demonstrating efficiency.
The approach enables smooth, on-the-fly continuous control in GUI interactions.
Abstract
Building intelligent agents capable of dexterous manipulation is essential for achieving human-like automation in both robotics and digital environments. However, existing GUI agents rely on discrete click predictions (x,y), which prohibits free-form, closed-loop trajectories (e.g. dragging a progress bar) that require continuous, on-the-fly perception and adjustment. In this work, we develop ShowUI-, the first flow-based generative model as GUI dexterous hand, featuring the following designs: (i) Unified Discrete-Continuous Actions, integrating discrete clicks and continuous drags within a shared model, enabling flexible adaptation across diverse interaction modes; (ii) Flow-based Action Generation for drag modeling, which predicts incremental cursor adjustments from continuous visual observations via a lightweight action expert, ensuring smooth and stable trajectories; (iii) Drag…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Motion and Animation · Social Robot Interaction and HRI
