MobileViews: A Million-scale and Diverse Mobile GUI Dataset
Longxi Gao, Li Zhang, Shihe Wang, Pengzhi Gao, Wei Liu, Jian Luan, Shangguang Wang, Yuanchun Li, Mengwei Xu

TL;DR
MobileViews introduces a large-scale, high-quality mobile GUI dataset with over 1.2 million samples, enabling significant improvements in visual language model performance for mobile GUI understanding.
Contribution
The paper presents a novel, large-scale mobile GUI dataset collected via automated, high-fidelity environment traversal, surpassing existing datasets in scale and quality.
Findings
MobileViews improves GUI grounding accuracy by up to 6.1%.
Large, high-quality datasets are crucial for training effective mobile GUI agents.
Automated data collection reduces human intervention and enhances dataset diversity.
Abstract
Visual language models (VLMs) empower mobile GUI agents to interpret complex mobile screens and respond to user requests. Training such capable agents requires large-scale, high-quality mobile GUI data. However, existing mobile GUI datasets are limited in scale, data comprehensiveness, and fidelity. To overcome this, we utilize two mobile SoC clusters to provide over 200 native, high-fidelity mobile environments, along with a VLM-enhanced automatic application traversal framework for highly parallel, automated dataset collection with minimal human intervention. With this system, we propose MobileViews, a million-scale mobile GUI dataset comprising over 1.2 million unique screenshot-view hierarchy pairs from more than 30K modern Android applications. We assess the effectiveness of MobileViews by training four VLMs using the reinforcement learning-based GUI grounding task and evaluating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems
