Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent
Fanglin Mo, Junzhe Chen, Haoxuan Zhu, Xuming Hu

TL;DR
This paper introduces SPlanner, a planning module based on extended finite state machines that improves mobile GUI agent task execution by generating effective, natural language-guided plans, significantly enhancing success rates on real-world benchmarks.
Contribution
The paper presents a novel EFSM-based planning module that decomposes user instructions into executable plans, improving mobile GUI agent performance in task execution.
Findings
Achieves 63.8% success rate on AndroidWorld benchmark with VLMs.
Improves task success by 28.8 percentage points over unplanned methods.
Demonstrates effective integration of EFSMs and LLMs for mobile GUI task planning.
Abstract
Mobile GUI agents execute user commands by directly interacting with the graphical user interface (GUI) of mobile devices, demonstrating significant potential to enhance user convenience. However, these agents face considerable challenges in task planning, as they must continuously analyze the GUI and generate operation instructions step by step. This process often leads to difficulties in making accurate task plans, as GUI agents lack a deep understanding of how to effectively use the target applications, which can cause them to become "lost" during task execution. To address the task planning issue, we propose SPlanner, a plug-and-play planning module to generate execution plans that guide vision language model(VLMs) in executing tasks. The proposed planning module utilizes extended finite state machines (EFSMs) to model the control logits and configurations of mobile applications. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Social Robot Interaction and HRI · Artificial Intelligence in Games
