On the Effects of Data Scale on UI Control Agents
Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo, Campbell-Ajala, Divya Tyamagundlu, Oriana Riva

TL;DR
This paper investigates how increasing training data affects the performance of fine-tuned language models controlling Android apps, revealing that in-domain performance improves with more data, but out-of-domain performance remains challenging.
Contribution
The study introduces AndroidControl, a large diverse dataset for training and analyzing UI control agents, and provides insights into data scale effects on in-domain and out-of-domain performance.
Findings
In-domain fine-tuned models improve with more data.
Out-of-domain performance improves slowly and remains limited.
More data alone may not suffice for robust out-of-domain control.
Abstract
Autonomous agents that control computer interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on human-collected task demonstrations, performance is still relatively low. In this work we study whether fine-tuning alone is a viable approach for building real-world computer control agents. In particularly, we investigate how performance measured on both high and low-level tasks in domain and out of domain scales as more training data is collected. To this end we collect and release a new dataset, AndroidControl, consisting of 15,283 demonstrations of everyday tasks with Android apps. Compared to existing datasets, each AndroidControl task instance includes both high and low-level human-generated instructions, allowing us to explore the level of task complexity an agent can handle. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Mining Algorithms and Applications · Fuzzy Logic and Control Systems
