AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Yanda Li, Chi Zhang, Wenjia Jiang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

TL;DR
This paper presents AppAgent v2, a multimodal LLM-based framework for mobile device interaction that can navigate interfaces, perform complex tasks, and adapt across applications with high accuracy, demonstrated through benchmark results.
Contribution
Introduces a novel flexible agent framework leveraging RAG technology for mobile interactions, enabling human-like navigation and multi-step task execution.
Findings
Superior performance on multiple benchmarks
Effective handling of complex, multi-step workflows
High adaptability across diverse applications
Abstract
With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible action space that enhances adaptability across various applications including parser, text and vision descriptions. The agent operates through two main phases: exploration and deployment. During the exploration phase, functionalities of user interface elements are documented either through agent-driven or manual explorations into a customized structured knowledge base. In the deployment phase, RAG technology enables efficient retrieval and update from this knowledge base, thereby empowering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Agent-Based Network Management · Context-Aware Activity Recognition Systems · Multi-Agent Systems and Negotiation
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Layer Normalization · Weight Decay · Dense Connections · WordPiece · Attention Dropout · Linear Warmup With Linear Decay · Byte Pair Encoding
