Agent-SAMA: State-Aware Mobile Assistant
Linqiang Guo, Wei Liu, Yi Wen Heng, Tse-Hsun (Peter) Chen, Yang Wang

TL;DR
Agent-SAMA introduces a state-aware multi-agent framework for mobile GUI tasks, modeling app navigation as a finite state machine to improve robustness, recovery, and task success in autonomous mobile assistants.
Contribution
This work presents a novel state-aware multi-agent system that models app navigation as an FSM, enhancing task planning, execution verification, and error recovery for GUI agents.
Findings
Achieves up to 84.0% success rate on cross-app benchmarks.
Improves task success by up to 12% over prior methods.
Demonstrates enhanced robustness and recovery in mobile GUI tasks.
Abstract
Mobile Graphical User Interface (GUI) agents aim to autonomously complete tasks within or across apps based on user instructions. While recent Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens and perform actions, existing agents remain fundamentally reactive. They reason over the current UI screen but lack a structured representation of the app navigation flow, limiting GUI agents' ability to understand execution context, detect unexpected execution results, and recover from errors. We introduce Agent-SAMA, a state-aware multi-agent framework that models app execution as a Finite State Machine (FSM), treating UI screens as states and user actions as transitions. Agent-SAMA implements four specialized agents that collaboratively construct and use FSMs in real time to guide task planning, execution verification, and recovery. We evaluate Agent-SAMA on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Multi-Agent Systems and Negotiation · Mobile Agent-Based Network Management
