CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning
Yuxuan Liu, Weikai Xu, Kun Huang, Changyu Chen, Jiankun Zhao, Pengzhi Gao, Wei Liu, Jian Luan, Shuo Shang, Bo Du, Ji-Rong Wen, Rui Yan

TL;DR
This paper introduces CoME, a novel mobile agent architecture with four specialized experts and a progressive training strategy, enhancing hybrid-capabilities reasoning and outperforming existing methods on benchmark datasets.
Contribution
The paper proposes a new architecture with expert-specific modules and training strategies to improve hybrid-capabilities reasoning in mobile agents.
Findings
CoME outperforms dense mobile agents on benchmark datasets.
The progressive training strategy effectively enhances expert capabilities.
InfoGain-Driven DPO reduces error propagation in reasoning processes.
Abstract
Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these capabilities. To address these challenges, we propose Channel-of-Mobile-Experts (CoME), a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, CoME activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation. To empower CoME with hybrid-capabilities reasoning, we introduce a progressive training strategy: Expert-FT enables decoupling and enhancement of different experts' capability; Router-FT aligns expert activation with the different reasoning stage; CoT-FT facilitates seamless…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Multi-Agent Systems and Negotiation · Multimodal Machine Learning Applications
