Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation
Zehao Deng, Tianjie Ju, Zheng Wu, Zhuosheng Zhang, Gongshen Liu

TL;DR
This paper introduces a staged reinforcement learning approach with a multi-agent framework to improve long-horizon GUI task automation by enhancing planning and state management capabilities.
Contribution
It proposes a novel Coordinator-Executor-State Tracker framework with high-level scheduling trained via reinforcement learning for better long-horizon task handling.
Findings
Significant improvement in long-horizon task performance.
The high-level scheduler is generalizable and plug-and-play.
Enhanced planning and state management capabilities.
Abstract
The rapid development of large vision-language model (VLM) has greatly promoted the research of GUI agent. However, GUI agents still face significant challenges in handling long-horizon tasks. First, single-agent models struggle to balance high-level capabilities and low-level execution capability, facing prevalent issues of responsibility coupling and capability conflicts. Second, agents lack awareness of the task state, leading to progress loss in long-horizon tasks. To address these challenges, we propose a staged execution-feedback reinforcement learning algorithm. Unlike training a unified policy model, we focus on training high-level scheduling models. Specifically, we propose and train two agents: a Coordinator, responsible for the strategic planning and task decomposition; and a State Tracker, responsible for context compression and information management to maintain the task's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Advanced Neural Network Applications
