MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu, Hongda Sun, Wei Liu, Jian Luan, Bo Du, Rui Yan

TL;DR
MobileSteward is a self-evolving multi-agent framework that automates complex cross-app instructions on mobile phones by integrating specialized agents and a memory-based learning mechanism, improving task execution accuracy.
Contribution
We introduce MobileSteward, the first framework combining object-oriented multi-agent coordination with self-evolution for cross-app instruction automation.
Findings
MobileSteward outperforms single-agent and multi-agent baselines.
The Memory-based Self-evolution enhances task success rates.
CAPBench provides a new benchmark for cross-app instruction tasks.
Abstract
Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following challenges: (1) complex task relationships, (2) diverse app environment, and (3) error propagation and information loss in multi-step execution. Drawing inspiration from object-oriented programming principles, we recognize that object-oriented solutions is more suitable for cross-app instruction. To address these challenges, we propose a self-evolving multi-agent framework named MobileSteward, which integrates multiple app-oriented StaffAgents coordinated by a centralized StewardAgent. We design three specialized modules in MobileSteward: (1) Dynamic Recruitment generates a scheduling graph guided by information flow to explicitly associate tasks among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
