VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Jiani Zheng, Lu Wang, Fangkai Yang, Chaoyun Zhang, Lingrui Mei, Wenjie, Yin, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

TL;DR
This paper introduces VEM, an environment-free reinforcement learning framework for GUI agents that uses a pretrained value environment model to estimate long-term action utility from offline data, improving robustness and performance.
Contribution
The paper proposes a novel environment-free RL approach using VEM to decouple value estimation from policy, enabling effective GUI automation without environment interactions.
Findings
VEM achieves state-of-the-art results on Android-in-the-Wild benchmarks.
VEM outperforms other environment-free methods significantly.
VEM matches the performance of environment-based approaches without interaction costs.
Abstract
Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples value estimation from policy optimization by leveraging a pretrained Value Environment Model (VEM). VEM predicts state-action values directly from offline data, distilling human-like priors about GUI interaction outcomes without requiring next-state prediction or environmental feedback. This avoids compounding errors and enhances resilience to UI changes by focusing on semantic reasoning (e.g., Does this action advance the user's goal?). The framework operates in two stages: (1) pretraining VEM to estimate long-term action utilities and (2) guiding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
