TL;DR
This paper presents GUI-Owl, a foundational GUI agent model, and Mobile-Agent-v3, a new framework that significantly advances open-source GUI automation across multiple platforms through innovative environment infrastructure, decision-making capabilities, and scalable reinforcement learning.
Contribution
Introduction of Mobile-Agent-v3, a comprehensive GUI agent framework that improves performance and incorporates novel environment infrastructure, decision-making, and reinforcement learning techniques.
Findings
GUI-Owl achieves state-of-the-art results on GUI benchmarks.
Mobile-Agent-v3 outperforms previous models with 73.3 on AndroidWorld.
The framework supports diverse platforms and scalable RL methods.
Abstract
This paper introduces GUI-Owl, a foundational GUI agent model that achieves state-of-the-art performance among open-source end-to-end models on ten GUI benchmarks across desktop and mobile environments, covering grounding, question answering, planning, decision-making, and procedural knowledge. GUI-Owl-7B achieves 66.4 on AndroidWorld and 29.4 on OSWorld. Building on this, we propose Mobile-Agent-v3, a general-purpose GUI agent framework that further improves performance to 73.3 on AndroidWorld and 37.7 on OSWorld, setting a new state-of-the-art for open-source GUI agent frameworks. GUI-Owl incorporates three key innovations: (1) Large-scale Environment Infrastructure: a cloud-based virtual environment spanning Android, Ubuntu, macOS, and Windows, enabling our Self-Evolving GUI Trajectory Production framework. This generates high-quality interaction data via automated query generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗mPLUG/GUI-Owl-7Bmodel· 537 dl· ♡ 52537 dl♡ 52
- 🤗mPLUG/GUI-Owl-32Bmodel· 158 dl· ♡ 27158 dl♡ 27
- 🤗mradermacher/GUI-Owl-7B-GGUFmodel· 144 dl· ♡ 3144 dl♡ 3
- 🤗mradermacher/GUI-Owl-7B-i1-GGUFmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗d0a0l0l0/GUI-Owl-7B-mlx-4Bitmodel· 31 dl31 dl
- 🤗d0a0l0l0/GUI-Owl-7B-mlx-fp16model· 16 dl16 dl
- 🤗mPLUG/GUI-Owl-7B-Desktop-RLmodel· 20 dl· ♡ 320 dl♡ 3
- 🤗japhone1111/GUI-Owl-7B-Q8_0-GGUFmodel· 10 dl10 dl
- 🤗mlx-community/GUI-Owl-7B-bf16model· 24 dl24 dl
- 🤗mlx-community/GUI-Owl-7B-6bitmodel· 21 dl21 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
