AutoGLM: Autonomous Foundation Agents for GUIs
Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang,, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, Junjie Gao, Junjun Shan,, Kangning Liu, Shudan Zhang, Shuntian Yao, Siyi Cheng, Wentao Yao, Wenyi Zhao,, Xinghan Liu, Xinyi Liu, Xinying Chen, Xinyue Yang

TL;DR
AutoGLM is an autonomous foundation agent system designed for GUI interactions, enabling decision-making and control in web and mobile environments through innovative training and interface techniques.
Contribution
The paper introduces AutoGLM, a novel foundation agent system with a new intermediate interface design and a progressive self-evolving reinforcement learning framework for real-world GUI control.
Findings
Achieves 55.2% success on VAB-WebArena-Lite, improving to 59.1% with retries.
Attains 96.2% success on OpenTable tasks.
Reaches 89.7% success on common Chinese app tasks.
Abstract
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation agents for autonomous control of digital devices through Graphical User Interfaces (GUIs). While foundation models excel at acquiring human knowledge, they often struggle with decision-making in dynamic real-world environments, limiting their progress toward artificial general intelligence. This limitation underscores the importance of developing foundation agents capable of learning through autonomous environmental interactions by reinforcing existing models. Focusing on Web Browser and Phone as representative GUI scenarios, we have developed AutoGLM as a practical foundation agent system for real-world GUI interactions. Our approach integrates a comprehensive suite of techniques and infrastructures to create deployable agent systems suitable for user delivery. Through this development, we have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zai-org/AutoGLM-Phone-9Bmodel· 150k dl· ♡ 433150k dl♡ 433
- 🤗zai-org/AutoGLM-Phone-9B-Multilingualmodel· 6.4k dl· ♡ 2316.4k dl♡ 231
- 🤗Mungert/AutoGLM-Phone-9B-GGUFmodel· 304 dl304 dl
- 🤗Mungert/AutoGLM-Phone-9B-Multilingual-GGUFmodel· 9.1k dl9.1k dl
- 🤗AbdulElahGwaith/Open-AutoGLMmodel
- 🤗hsb47gx6/AutoGLM-Phone-9Bmodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Applications and Data Management · Human Motion and Animation · Web Data Mining and Analysis
