TL;DR
ClawGUI is an open-source framework that unifies training, evaluation, and deployment of GUI agents across virtual and real devices, improving reproducibility and real-world applicability.
Contribution
It introduces a comprehensive full-stack infrastructure for GUI agents, including RL training, standardized evaluation, and deployment on multiple platforms.
Findings
Achieves 95.8% reproduction accuracy across benchmarks.
ClawGUI-2B outperforms baseline with 17.1% success rate on MobileWorld.
Supports deployment on Android, HarmonyOS, and iOS with multi-platform compatibility.
Abstract
GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers from environment instability and closed pipelines, evaluation protocols drift silently across works, and trained agents rarely reach real users on real devices. We present \textbf{ClawGUI}, an open-source framework addressing these three gaps within a single harness. \textbf{ClawGUI-RL} provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
