LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task   Automation

Li Zhang; Shihe Wang; Xianqing Jia; Zhihan Zheng; Yunhe Yan; Longxi; Gao; Yuanchun Li; Mengwei Xu

arXiv:2404.16054·cs.HC·August 5, 2024·1 cites

LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation

Li Zhang, Shihe Wang, Xianqing Jia, Zhihan Zheng, Yunhe Yan, Longxi, Gao, Yuanchun Li, Mengwei Xu

PDF

Open Access 1 Repo

TL;DR

LlamaTouch introduces a scalable, faithful on-device testbed for mobile UI task automation evaluation, leveraging UI state transfer and multi-level matching to improve over traditional human validation methods.

Contribution

It presents a novel on-device evaluation framework with UI state transfer, detailed annotation, and multi-level matching, enabling scalable and faithful assessment of mobile agents.

Findings

01

High evaluation faithfulness demonstrated in real-world environments.

02

Better scalability compared to human validation methods.

03

Supports diverse mobile applications with multiple agents and tasks.

Abstract

The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in mobile UI task automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined action sequences, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device mobile UI task execution and faithful, scalable task evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with realistic mobile environments for task execution. (2) Fine-grained UI component…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

llamatouch/llamatouch
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Interactive and Immersive Displays · IoT and Edge/Fog Computing

MethodsSparse Evolutionary Training