ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

Runliang Niu; Jinglong Ji; Yi Chang; Qi Wang

arXiv:2505.19095·cs.AI·May 27, 2025

ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

Runliang Niu, Jinglong Ji, Yi Chang, Qi Wang

PDF

1 Repo 1 Models

TL;DR

ScreenExplorer is a vision-language model trained with a novel exploration strategy to improve generalization and exploration in open, dynamic GUI environments, advancing towards more adaptable AGI systems.

Contribution

Introduces ScreenExplorer, a VLM trained with Group Relative Policy Optimization and a world-model-based curiosity reward for enhanced exploration in open GUI worlds.

Findings

01

Better environmental adaptation compared to static models

02

Enhanced exploration capabilities through experience distillation

03

Scalable approach toward self-improving AGI in complex settings

Abstract

The rapid progress of large language models (LLMs) has sparked growing interest in building Artificial General Intelligence (AGI) within Graphical User Interface (GUI) environments. However, existing GUI agents based on LLMs or vision-language models (VLMs) often fail to generalize to novel environments and rely heavily on manually curated, diverse datasets. To overcome these limitations, we introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments. Innovatively, we introduced a world-model-based curiosity reward function to help the agent overcome the cold-start phase of exploration. Additionally, distilling experience streams further enhances the model's exploration capabilities. Our training framework enhances model exploration in open GUI environments, with trained models showing better environmental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

niuzaisheng/screenexplorer
pytorchOfficial

Models

🤗
niurl/ScreenExplorer
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.