GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
Junan Hu, Jian Liu, Jingxiang Lai, Jiarui Hu, Yiwei Sheng, Shuang Chen, Jian Li, Dazhao Du, Song Guo

TL;DR
This paper reviews how reinforcement learning can enhance GUI agents, proposing a taxonomy and analyzing trends like reward design and world-model training to guide future development of digital inhabitants.
Contribution
It provides the first comprehensive overview of RL approaches for GUI agents, introducing a taxonomy and analyzing emerging trends and technical innovations.
Findings
Composite reward architectures improve reliability and scalability.
World-model-based training addresses GUI I/O latency bottlenecks.
Emergence of System-2-style deliberation reduces need for explicit reasoning supervision.
Abstract
Graphical User Interface (GUI) agents have emerged as a promising paradigm for intelligent systems that perceive and interact with graphical interfaces visually. Yet supervised fine-tuning alone cannot handle long-horizon credit assignment, distribution shifts, and safe exploration in irreversible environments, making Reinforcement Learning (RL) a central methodology for advancing automation. In this work, we present the first comprehensive overview of the intersection between RL and GUI agents, and examine how this research direction may evolve toward digital inhabitants. We propose a principled taxonomy that organizes existing methods into Offline RL, Online RL, and Hybrid Strategies, and complement it with analyses of reward engineering, data efficiency, and key technical innovations. Our analysis reveals several emerging trends: the tension between reliability and scalability is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
