SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

Jichao Wang; Liuyang Bian; Yufeng Zhou; Han Xiao; Yue Pan; Guozhi Wang; Hao Wang; Zhaoxiong Wang; Yafei Wen; Xiaoxin Chen; Shuai Ren; Lingfang Zeng

arXiv:2604.22558·cs.LG·April 27, 2026

SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

Jichao Wang, Liuyang Bian, Yufeng Zhou, Han Xiao, Yue Pan, Guozhi Wang, Hao Wang, Zhaoxiong Wang, Yafei Wen, Xiaoxin Chen, Shuai Ren, Lingfang Zeng

PDF

TL;DR

SOLAR-RL introduces a semi-online reinforcement learning framework that enhances long-horizon GUI navigation by integrating trajectory-level insights into offline learning, reducing interaction costs.

Contribution

It presents a novel method combining offline data with trajectory-level reward shaping to improve long-term task performance in GUI agents.

Findings

01

Significantly improves task completion rates.

02

Enhances robustness in GUI navigation tasks.

03

Reduces reliance on costly online interactions.

Abstract

As Multimodal Large Language Models (MLLMs) mature, GUI agents are evolving from static interactions to complex navigation. While Reinforcement Learning (RL) has emerged as a promising paradigm for training MLLM agents on dynamic GUI tasks, its effective application faces a dilemma. Standard Offline RL often relies on static step-level data, neglecting global trajectory semantics such as task completion and execution quality. Conversely, Online RL captures the long-term dynamics but suffers from high interaction costs and potential environmental instability. To bridge this gap, we propose SOLAR-RL (Semi-Online Long-horizon Assignment Reinforcement Learning). Instead of relying solely on expensive online interactions, our framework integrates global trajectory insights directly into the offline learning process. Specifically, we reconstruct diverse rollout candidates from static data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.