On the Effects of Data Scale on UI Control Agents

Wei Li; William Bishop; Alice Li; Chris Rawles; Folawiyo; Campbell-Ajala; Divya Tyamagundlu; Oriana Riva

arXiv:2406.03679·cs.AI·November 14, 2024

On the Effects of Data Scale on UI Control Agents

Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo, Campbell-Ajala, Divya Tyamagundlu, Oriana Riva

PDF

Open Access 2 Datasets 1 Video

TL;DR

This paper investigates how increasing training data affects the performance of fine-tuned language models controlling Android apps, revealing that in-domain performance improves with more data, but out-of-domain performance remains challenging.

Contribution

The study introduces AndroidControl, a large diverse dataset for training and analyzing UI control agents, and provides insights into data scale effects on in-domain and out-of-domain performance.

Findings

01

In-domain fine-tuned models improve with more data.

02

Out-of-domain performance improves slowly and remains limited.

03

More data alone may not suffice for robust out-of-domain control.

Abstract

Autonomous agents that control computer interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on human-collected task demonstrations, performance is still relatively low. In this work we study whether fine-tuning alone is a viable approach for building real-world computer control agents. In particularly, we investigate how performance measured on both high and low-level tasks in domain and out of domain scales as more training data is collected. To this end we collect and release a new dataset, AndroidControl, consisting of 15,283 demonstrations of everyday tasks with Android apps. Compared to existing datasets, each AndroidControl task instance includes both high and low-level human-generated instructions, allowing us to explore the level of task complexity an agent can handle. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

On the Effects of Data Scale on UI Control Agents· slideslive

Taxonomy

TopicsData Mining Algorithms and Applications · Fuzzy Logic and Control Systems