ShowUI-$\pi$: Flow-based Generative Models as GUI Dexterous Hands

Siyuan Hu; Kevin Qinghong Lin; Mike Zheng Shou

arXiv:2512.24965·cs.CV·January 1, 2026

ShowUI-$\pi$: Flow-based Generative Models as GUI Dexterous Hands

Siyuan Hu, Kevin Qinghong Lin, Mike Zheng Shou

PDF

Open Access

TL;DR

ShowUI-$\pi$ introduces a flow-based generative model enabling continuous, dexterous GUI manipulation like dragging, combining discrete and continuous actions, and establishing a new benchmark for evaluating GUI drag capabilities.

Contribution

The paper presents the first flow-based generative model for GUI dexterous manipulation, integrating discrete and continuous actions, and provides a new dataset and benchmark for GUI drag tasks.

Findings

01

ShowUI-$\pi$ outperforms existing proprietary GUI agents on the ScreenDrag benchmark.

02

The model achieves 26.98 score with only 450M parameters, demonstrating efficiency.

03

The approach enables smooth, on-the-fly continuous control in GUI interactions.

Abstract

Building intelligent agents capable of dexterous manipulation is essential for achieving human-like automation in both robotics and digital environments. However, existing GUI agents rely on discrete click predictions (x,y), which prohibits free-form, closed-loop trajectories (e.g. dragging a progress bar) that require continuous, on-the-fly perception and adjustment. In this work, we develop ShowUI- $π$ , the first flow-based generative model as GUI dexterous hand, featuring the following designs: (i) Unified Discrete-Continuous Actions, integrating discrete clicks and continuous drags within a shared model, enabling flexible adaptation across diverse interaction modes; (ii) Flow-based Action Generation for drag modeling, which predicts incremental cursor adjustments from continuous visual observations via a lightweight action expert, ensuring smooth and stable trajectories; (iii) Drag…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Motion and Animation · Social Robot Interaction and HRI