MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, Steven Hoi

TL;DR
MAI-UI introduces a scalable, adaptive framework for GUI agents that significantly advances GUI grounding and mobile navigation, addressing deployment challenges with a unified, self-evolving system.
Contribution
It presents MAI-UI, a comprehensive foundation GUI agent framework with a novel deployment architecture, self-evolving data pipeline, and online RL, achieving state-of-the-art results in GUI grounding and navigation.
Findings
Achieves new SOTA on multiple GUI grounding benchmarks.
Sets a new SOTA of 76.7% success rate on AndroidWorld navigation.
Improves on-device performance and reduces cloud calls by over 40%.
Abstract
The development of GUI agents could revolutionize the next generation of human-computer interaction. Motivated by this vision, we present MAI-UI, a family of foundation GUI agents spanning the full spectrum of sizes, including 2B, 8B, 32B, and 235B-A22B variants. We identify four key challenges to realistic deployment: the lack of native agent-user interaction, the limits of UI-only operation, the absence of a practical deployment architecture, and brittleness in dynamic environments. MAI-UI addresses these issues with a unified methodology: a self-evolving data pipeline that expands the navigation data to include user interaction and MCP tool calls, a native device-cloud collaboration system routes execution by task state, and an online RL framework with advanced optimizations to scale parallel environments and context length. MAI-UI establishes new state-of-the-art across GUI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Context-Aware Activity Recognition Systems · Usability and User Interface Design
