Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions
Guo Gan, Yuxuan Ding, Cong Chen, Yuwei Ren, Yin Huang, Hong Zhou

TL;DR
Android Coach introduces a novel training framework for online reinforcement learning in Android agents, enabling multiple actions per state to improve efficiency and success rates without extra emulator overhead.
Contribution
It shifts the RL training paradigm to allow multiple actions per state using a critic-based approach, enhancing learning efficiency and effectiveness.
Findings
Achieves 7.5% and 8.3% success rate improvements on AndroidLab and AndroidWorld.
Attains 1.4x higher training efficiency than PPO and GRPO at similar success rates.
Effectively utilizes a critic, reward model, and advantage estimator to improve online RL in Android environments.
Abstract
Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully exploring each costly emulator state. In this paper, we propose Android Coach, a novel framework that shifts the training paradigm to Single State Multiple Actions, allowing the agent to sample and utilize multiple actions for a single online state. We enable this without additional emulator overhead by learning a critic that estimates action values. To ensure the critic serves as a reliable coach,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
