Dynamic Planning for LLM-based Graphical User Interface Automation
Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Xinbei Ma, Muyun Yang,, Tiejun Zhao, Min Zhang

TL;DR
This paper introduces D-PoT, a dynamic planning approach for LLM-based GUI agents that adapts plans based on environmental feedback, significantly improving task accuracy over existing methods.
Contribution
The paper proposes D-PoT, a novel dynamic planning method that enhances LLM-based GUI automation by adapting to feedback and reducing hallucinations, outperforming previous approaches.
Findings
D-PoT achieves +12.7% accuracy over GPT-4V baseline.
Dynamic planning improves adaptability to unseen tasks.
D-PoT reduces hallucinations in LLM-based GUI agents.
Abstract
The advent of large language models (LLMs) has spurred considerable interest in advancing autonomous LLMs-based agents, particularly in intriguing applications within smartphone graphical user interfaces (GUIs). When presented with a task goal, these agents typically emulate human actions within a GUI environment until the task is completed. However, a key challenge lies in devising effective plans to guide action prediction in GUI tasks, though planning have been widely recognized as effective for decomposing complex tasks into a series of steps. Specifically, given the dynamic nature of environmental GUIs following action execution, it is crucial to dynamically adapt plans based on environmental feedback and action history.We show that the widely-used ReAct approach fails due to the excessively long historical dialogues. To address this challenge, we propose a novel approach called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Manufacturing and Logistics Optimization · Robotic Path Planning Algorithms · Robotics and Automated Systems
