Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, Zhiyuan Chen, Jitong Liao, Qi Zheng, Jiahui Zeng, Ze Xu, Shuai Bai, Junyang Lin, Jingren Zhou, Ming Yan

TL;DR
The paper presents GUI-Owl-1.5, a multi-platform GUI agent model that achieves state-of-the-art results on various benchmarks by integrating innovative data pipelines, enhanced reasoning capabilities, and a new RL training algorithm for diverse environments.
Contribution
Introduces GUI-Owl-1.5, a native multi-platform GUI agent model with novel data collection, reasoning enhancements, and a specialized RL algorithm for multi-platform tasks.
Findings
Achieves top performance on 20+ GUI benchmarks.
Demonstrates effective multi-platform and multi-task capabilities.
Open-sourced model and demo available online.
Abstract
The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves state-of-the-art results on more than 20+ GUI benchmarks on open-source models: (1) on GUI automation tasks, it obtains 56.5 on OSWorld, 71.6 on AndroidWorld, and 48.4 on WebArena; (2) on grounding tasks, it obtains 80.3 on ScreenSpotPro; (3) on tool-calling tasks, it obtains 47.6 on OSWorld-MCP, and 46.8 on MobileWorld; (4) on memory and knowledge tasks, it obtains 75.5 on GUI-Knowledge Bench. GUI-Owl-1.5 incorporates several key innovations: (1) Hybird Data Flywheel: we construct the data pipeline for UI understanding and trajectory generation based on a combination of simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗mradermacher/GUI-Owl-1.5-4B-Instruct-GGUFmodel· 188 dl· ♡ 2188 dl♡ 2
- 🤗mPLUG/GUI-Owl-1.5-2B-Instructmodel· 8.5k dl· ♡ 88.5k dl♡ 8
- 🤗mPLUG/GUI-Owl-1.5-8B-Instructmodel· 24k dl· ♡ 624k dl♡ 6
- 🤗mPLUG/GUI-Owl-1.5-4B-Instructmodel· 717 dl· ♡ 3717 dl♡ 3
- 🤗mPLUG/GUI-Owl-1.5-8B-Thinkmodel· 570 dl· ♡ 6570 dl♡ 6
- 🤗mPLUG/GUI-Owl-1.5-32B-Instructmodel· 364 dl· ♡ 4364 dl♡ 4
- 🤗mPLUG/GUI-Owl-1.5-32B-Thinkmodel· 190 dl· ♡ 3190 dl♡ 3
- 🤗mradermacher/GUI-Owl-1.5-2B-Instruct-GGUFmodel· 296 dl· ♡ 2296 dl♡ 2
- 🤗mradermacher/GUI-Owl-1.5-8B-Think-GGUFmodel· 151 dl151 dl
- 🤗mradermacher/GUI-Owl-1.5-8B-Instruct-GGUFmodel· 369 dl· ♡ 1369 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Artificial Intelligence in Games · Advanced Software Engineering Methodologies
