Learning with Challenges: Adaptive Difficulty-Aware Data Generation for Mobile GUI Agent Training
Linjia Kang, Zhimin Wang, Yongkang Zhang, Duo Wu, Jinghe Wang, Ming Ma, Haopeng Yan, Zhi Wang

TL;DR
MobileGen is a novel adaptive data generation framework that aligns training task difficulty with a GUI agent's capabilities, significantly improving agent performance by systematically controlling structural and semantic challenge levels.
Contribution
It introduces a capability-aware data generation method that decouples task difficulty into structural and semantic aspects, enabling more effective training of mobile GUI agents.
Findings
Outperforms existing methods with 1.57x performance improvement
Effectively aligns training difficulty with agent capabilities
Enhances GUI agent training across multiple benchmarks
Abstract
Large-scale, high-quality interaction trajectories are essential for advancing mobile Graphical User Interface (GUI) agents. While existing methods typically rely on labor-intensive human demonstrations or automated model exploration to generate GUI trajectories, they lack fine-grained control over task difficulty. This fundamentally restricts learning effectiveness due to the mismatch between the training difficulty and the agent's capabilities. Inspired by how humans acquire skills through progressively challenging tasks, we propose MobileGen, a novel data generation framework that adaptively aligns training difficulty with the GUI agent's capability frontier. Specifically, MobileGen explicitly decouples task difficulty into structural (e.g., trajectory length) and semantic (e.g., task goal) dimensions. It then iteratively evaluates the agent on a curated prior dataset to construct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning
