Foundations and Recent Trends in Multimodal Mobile Agents: A Survey
Biao Wu, Yanda Li, Zhiwei Zhang, Yunchao Wei, Meng Fang, Ling Chen

TL;DR
This survey reviews recent advancements in multimodal mobile agents, emphasizing real-time adaptability, evaluation benchmarks, and approaches like prompt-based and training-based methods for improved mobile task performance.
Contribution
It categorizes recent technological approaches and benchmarks in multimodal mobile agents, providing a comprehensive overview and future research directions.
Findings
Development of new evaluation benchmarks for mobile agents
Categorization of prompt-based and training-based methods
Identification of key challenges and future research directions
Abstract
Mobile agents are essential for automating tasks in complex and dynamic mobile environments. As foundation models evolve, the demands for agents that can adapt in real-time and process multimodal data have grown. This survey provides a comprehensive review of mobile agent technologies, focusing on recent advancements that enhance real-time adaptability and multimodal interaction. Recent evaluation benchmarks have been developed better to capture the static and interactive environments of mobile tasks, offering more accurate assessments of agents' performance. We then categorize these advancements into two main approaches: prompt-based methods, which utilize large language models (LLMs) for instruction-based task execution, and training-based methods, which fine-tune multimodal models for mobile-specific applications. Additionally, we explore complementary technologies that augment agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Speech and dialogue systems · Multi-Agent Systems and Negotiation
