AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation

Hao Wen; Shizuo Tian; Borislav Pavlov; Wenjie Du; Yixuan Li; Ge Chang,; Shanhui Zhao; Jiacheng Liu; Yunxin Liu; Ya-Qin Zhang; Yuanchun Li

arXiv:2412.18116·cs.AI·May 7, 2025

AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation

Hao Wen, Shizuo Tian, Borislav Pavlov, Wenjie Du, Yixuan Li, Ge Chang,, Shanhui Zhao, Jiacheng Liu, Yunxin Liu, Ya-Qin Zhang, Yuanchun Li

PDF

Open Access 1 Repo

TL;DR

AutoDroid-V2 leverages small language models and code generation techniques to improve mobile UI task automation on-device, enhancing privacy, reducing latency, and lowering resource consumption compared to existing large-model-based agents.

Contribution

It introduces a document-centered approach for generating UI automation code with small language models, enabling efficient on-device execution and improved success rates.

Findings

01

Higher success rates in task automation

02

Lower latency and token consumption

03

Effective on-device execution of UI scripts

Abstract

Large language models (LLMs) have brought exciting new advances to mobile UI agents, a long-standing research field that aims to complete arbitrary natural language tasks through mobile UI interactions. However, existing UI agents usually demand powerful large language models that are difficult to be deployed locally on end-users' devices, raising huge concerns about user privacy and centralized serving cost. Inspired by the remarkable coding abilities of recent small language models (SLMs), we propose to convert the UI task automation problem to a code generation problem, which can be effectively solved by an on-device SLM and efficiently executed with an on-device code interpreter. Unlike normal coding tasks that can be extensively pre-trained with public datasets, generating UI automation code is challenging due to the diversity, complexity, and variability of target apps. Therefore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mobilellm/autodroid-v2
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence

MethodsADaptive gradient method with the OPTimal convergence rate