MobileUse: A GUI Agent with Hierarchical Reflection for Autonomous Mobile Operation
Ning Li, Xiangmou Qu, Jiamu Zhou, Jun Wang, Muning Wen, Kounianhua Du, Xingyu Lou, Qiuying Peng, Jun Wang, Weinan Zhang

TL;DR
MobileUse is a hierarchical reflection-based GUI agent that enhances autonomous mobile task execution by self-monitoring, error recovery, and proactive exploration, achieving state-of-the-art success rates on benchmark datasets.
Contribution
The paper introduces MobileUse, a novel GUI agent with hierarchical reflection and proactive exploration modules for improved robustness and adaptability in mobile task automation.
Findings
Achieves success rates of 62.9% on AndroidWorld and 44.2% on AndroidLab.
Demonstrates state-of-the-art performance in mobile task execution.
Provides an open-source toolkit for real-world mobile automation.
Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have enabled the development of mobile agents that can understand visual inputs and follow user instructions, unlocking new possibilities for automating complex tasks on mobile devices. However, applying these models to real-world mobile scenarios remains a significant challenge due to the long-horizon task execution, difficulty in error recovery, and the cold-start problem in unfamiliar environments. To address these challenges, we propose MobileUse, a GUI agent designed for robust and adaptive mobile task execution. To improve resilience in long-horizon tasks and dynamic environments, we introduce a hierarchical reflection architecture that enables the agent to self-monitor, detect, and recover from errors across multiple temporal scales-ranging from individual actions to overall task completion-while maintaining efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Context-Aware Activity Recognition Systems · Multi-Agent Systems and Negotiation
