Dynamic Planning for LLM-based Graphical User Interface Automation

Shaoqing Zhang; Zhuosheng Zhang; Kehai Chen; Xinbei Ma; Muyun Yang,; Tiejun Zhao; Min Zhang

arXiv:2410.00467·cs.AI·December 20, 2024

Dynamic Planning for LLM-based Graphical User Interface Automation

Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Xinbei Ma, Muyun Yang,, Tiejun Zhao, Min Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces D-PoT, a dynamic planning approach for LLM-based GUI agents that adapts plans based on environmental feedback, significantly improving task accuracy over existing methods.

Contribution

The paper proposes D-PoT, a novel dynamic planning method that enhances LLM-based GUI automation by adapting to feedback and reducing hallucinations, outperforming previous approaches.

Findings

01

D-PoT achieves +12.7% accuracy over GPT-4V baseline.

02

Dynamic planning improves adaptability to unseen tasks.

03

D-PoT reduces hallucinations in LLM-based GUI agents.

Abstract

The advent of large language models (LLMs) has spurred considerable interest in advancing autonomous LLMs-based agents, particularly in intriguing applications within smartphone graphical user interfaces (GUIs). When presented with a task goal, these agents typically emulate human actions within a GUI environment until the task is completed. However, a key challenge lies in devising effective plans to guide action prediction in GUI tasks, though planning have been widely recognized as effective for decomposing complex tasks into a series of steps. Specifically, given the dynamic nature of environmental GUIs following action execution, it is crucial to dynamically adapt plans based on environmental feedback and action history.We show that the widely-used ReAct approach fails due to the excessively long historical dialogues. To address this challenge, we propose a novel approach called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sqzhang-lazy/d-pot
jaxOfficial

Videos

Dynamic Planning for LLM-based Graphical User Interface Automation· underline

Taxonomy

TopicsAdvanced Manufacturing and Logistics Optimization · Robotic Path Planning Algorithms · Robotics and Automated Systems