Large Language Model-Brained GUI Agents: A Survey
Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu, Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi, Zhang

TL;DR
This survey reviews the development, techniques, and applications of LLM-powered GUI agents, highlighting their transformative impact on human-computer interaction and outlining future research directions.
Contribution
It provides a comprehensive overview of LLM-brained GUI agents, including their evolution, core components, training data, evaluation metrics, and emerging applications, identifying key research gaps.
Findings
LLM-based GUI agents enable complex, multi-step automation.
Significant progress in research and industry applications.
Identified key challenges and future research directions.
Abstract
GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a new generation of LLM-brained GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Topic Modeling · Natural Language Processing Techniques
