Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

Zehao Deng; Tianjie Ju; Zheng Wu; Zhuosheng Zhang; Gongshen Liu

arXiv:2511.22235·cs.AI·March 5, 2026

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

Zehao Deng, Tianjie Ju, Zheng Wu, Zhuosheng Zhang, Gongshen Liu

PDF

Open Access

TL;DR

This paper introduces a staged reinforcement learning approach with a multi-agent framework to improve long-horizon GUI task automation by enhancing planning and state management capabilities.

Contribution

It proposes a novel Coordinator-Executor-State Tracker framework with high-level scheduling trained via reinforcement learning for better long-horizon task handling.

Findings

01

Significant improvement in long-horizon task performance.

02

The high-level scheduler is generalizable and plug-and-play.

03

Enhanced planning and state management capabilities.

Abstract

The rapid development of large vision-language model (VLM) has greatly promoted the research of GUI agent. However, GUI agents still face significant challenges in handling long-horizon tasks. First, single-agent models struggle to balance high-level capabilities and low-level execution capability, facing prevalent issues of responsibility coupling and capability conflicts. Second, agents lack awareness of the task state, leading to progress loss in long-horizon tasks. To address these challenges, we propose a staged execution-feedback reinforcement learning algorithm. Unlike training a unified policy model, we focus on training high-level scheduling models. Specifically, we propose and train two agents: a Coordinator, responsible for the strategic planning and task decomposition; and a State Tracker, responsible for context compression and information management to maintain the task's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Advanced Neural Network Applications