ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

Fanbin Lu; Zhisheng Zhong; Shu Liu; Chi-Wing Fu; Jiaya Jia

arXiv:2505.16282·cs.CV·May 23, 2025

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

Fanbin Lu, Zhisheng Zhong, Shu Liu, Chi-Wing Fu, Jiaya Jia

PDF

Open Access 1 Repo

TL;DR

This paper introduces ARPO, an end-to-end reinforcement learning method with experience replay for training vision-language GUI agents, significantly improving performance on complex, long-horizon tasks in GUI environments.

Contribution

ARPO combines policy optimization with experience replay and task filtering to enhance training stability and performance of GUI agents using LLMs.

Findings

01

ARPO achieves state-of-the-art results on OSWorld benchmark.

02

Experience replay improves training efficiency and stability.

03

Task filtering enhances learning from informative interactions.

Abstract

Training large language models (LLMs) as interactive agents for controlling graphical user interfaces (GUIs) presents a unique challenge to optimize long-horizon action sequences with multimodal feedback from complex environments. While recent works have advanced multi-turn reinforcement learning (RL) for reasoning and tool-using capabilities in LLMs, their application to GUI-based agents remains relatively underexplored due to the difficulty of sparse rewards, delayed feedback, and high rollout costs. In this paper, we investigate end-to-end policy optimization for vision-language-based GUI agents with the aim of improving performance on complex, long-horizon computer tasks. We propose Agentic Replay Policy Optimization (ARPO), an end-to-end RL approach that augments Group Relative Policy Optimization (GRPO) with a replay buffer to reuse the successful experience across training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dvlab-research/arpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling

MethodsFocus