M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
Rui Lv, Juncheng Mo, Tianyi Chu, Chen Rao, Hongyi Jing, Jiajie Teng, Jiafu Chen, Shiqi Zhang, Liangzi Ding, Shuo Fang, Huaizhong Lin, Ziqiang Dang, Chenguang Ma, Lei Zhao

TL;DR
This paper introduces M$^2$-Miner, a novel multi-agent framework using MCTS for efficient, low-cost mobile GUI data mining, significantly improving data quality and diversity for training GUI agents.
Contribution
It presents the first automated multi-agent MCTS-based framework for mobile GUI data mining, incorporating strategies for enhanced efficiency, data diversity, and model training.
Findings
Achieves state-of-the-art performance on mobile GUI benchmarks.
Reduces data mining costs and improves data quality.
Enriches intent diversity through intent recycling.
Abstract
Graphical User Interface (GUI) agent is pivotal to advancing intelligent human-computer interaction paradigms. Constructing powerful GUI agents necessitates the large-scale annotation of high-quality user-behavior trajectory data (i.e., intent-trajectory pairs) for training. However, manual annotation methods and current GUI agent data mining approaches typically face three critical challenges: high construction cost, poor data quality, and low data richness. To address these issues, we propose M-Miner, the first low-cost and automated mobile GUI agent data-mining framework based on Monte Carlo Tree Search (MCTS). For better data mining efficiency and quality, we present a collaborative multi-agent framework, comprising InferAgent, OrchestraAgent, and JudgeAgent for guidance, acceleration, and evaluation. To further enhance the efficiency of mining and enrich intent diversity, we…
Peer Reviews
Decision·ICLR 2026 Poster
The intent recycling strategy re-evaluates sibling paths to extract multiple intent-trajectory pairs from a single search tree, significantly improving data diversity and mining efficiency without additional exploration costs. The progressive model-in-the-loop training implements a three-stage training strategy, allowing agent capabilities to improve progressively in tandem with data complexity, which enhances the mining success rate in unseen scenarios.
- The ablation study should be expanded: include a baseline using the stronger 72B model for InferAgent and JudgeAgent, but without the model-in-the-loop (MITL) strategy. This is necessary to validate the true effectiveness of MITL. - The paper mentions using 8 A100-80G GPUs for training and "retraining for 2 epochs on the full mined dataset at each stage". These significant computational costs, as well as the API costs for Qwen2.5-VL-72B, seem to be omitted from the 196 total cost claimed in T
S1. Strong Experiments. It compares with 13 methods and analyzes the effect of agent numbers and online learning strategies, showing solid and comprehensive evaluation. S2. Practical Significance. The framework is scalable and adaptable, demonstrating potential for real-world GUI automation and broader mobile applications. S3. Trajectory Recycling is an interesting and computation-efficient design.
W1. Writing and Presentation Issues. The paper contains several typos and minor writing problems that affect readability: 1. Line 52: “we presents” ->“we present” 2. Line 274: “where i denotes the i-th visit to the node” appears twice. 3. Line 353” “This is crucial when targeting new application scenarios.” is unclear — please specify what scenarios are referred to. 4. Line 480 “significantly improve” -> “improves”. 5. Line 484 “an solid foundation” -> “a solid foundation” 6. Line 485 “Statemen
1.This paper propose a fully automated framework for mobile GUI agent data mining. By introducing MCTS and designing a collaborative multi-agent framework, the method improve data mining efficiency while enhancing data quality. 2.The intent recycling strategy further enhances both mining efficiency and intent richness, while the progressive model-in-the-loop training paradigm boosts success rates in both familiar and novel environments. 3.Extensive experiments show that GUI agents trained on th
1. The paper propose an automated mobile GUI agent data-mining framework based on Monte Carlo Tree Search(MCTS). Monte Carlo tree search is a classic algorithm, is its innovation insufficient? 2.The background knowledge of MOBILE GUI AGENT DATA MING was not sufficiently introduced in the paper writing, making it difficult to understand.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Time Series Analysis and Forecasting · Recommender Systems and Techniques
