Adaptive Milestone Reward for GUI Agents

Congmin Zheng; Xiaoyun Mo; Xinbei Ma; Qiqiang Lin; Yin Zhao; Jiachen Zhu; Xingyu Lou; Jun Wang; Zhaoxiang Wang; Weiwen Liu; Zhuosheng Zhang; Yong Yu; Weinan Zhang

arXiv:2602.11524·cs.LG·February 13, 2026

Adaptive Milestone Reward for GUI Agents

Congmin Zheng, Xiaoyun Mo, Xinbei Ma, Qiqiang Lin, Yin Zhao, Jiachen Zhu, Xingyu Lou, Jun Wang, Zhaoxiang Wang, Weiwen Liu, Zhuosheng Zhang, Yong Yu, Weinan Zhang

PDF

Open Access

TL;DR

This paper introduces ADMIRE, an adaptive reward system for GUI agents that improves success rates in reinforcement learning by dynamically anchoring milestones and denoising trajectories, enhancing generalizability and robustness.

Contribution

The paper presents ADMIRE, a novel adaptive milestone reward mechanism that addresses reward sparsity and bias in RL for GUI agents, with dynamic milestone construction and asymmetric credit assignment.

Findings

01

Over 10% success rate improvement on AndroidWorld.

02

Robust performance across various RL algorithms.

03

Effective in diverse environments like web navigation and embodied tasks.

Abstract

Reinforcement Learning (RL) has emerged as a mainstream paradigm for training Mobile GUI Agents, yet it struggles with the temporal credit assignment problem inherent in long-horizon tasks. A primary challenge lies in the trade-off between reward fidelity and density: outcome reward offers high fidelity but suffers from signal sparsity, while process reward provides dense supervision but remains prone to bias and reward hacking. To resolve this conflict, we propose the Adaptive Milestone Reward (ADMIRE) mechanism. ADMIRE constructs a verifiable, adaptive reward system by anchoring trajectory to milestones, which are dynamically distilled from successful explorations. Crucially, ADMIRE integrates an asymmetric credit assignment strategy that denoises successful trajectories and scaffolds failed trajectories. Extensive experiments demonstrate that ADMIRE consistently yields over 10%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Recommender Systems and Techniques · Social Robot Interaction and HRI